summaryrefslogtreecommitdiffstats
path: root/tools
Commit message (Collapse)AuthorAgeFilesLines
* perf annotate: Add per arch instructions annotate handlersArnaldo Carvalho de Melo2016-11-173-106/+198
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Another step in supporting cross annotation. The arch specific tables are put in: tools/perf/arch/$ARCH/annotation/instructions.c which, so far, just plug instructions to a bunch of parsers/formatters, but may have more as the need arises. This is an alternative implementation to a previous attempt made by Ravi Bangoria. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-g3wt282lfa51j4qd0813e3az@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Allow arches to specify functions to skipArnaldo Carvalho de Melo2016-11-171-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is to cope with an ARM specific kludge introduced in the original patch supporting ARM annotation, cfef25b8daf7 ("perf annotate: ARM support") that made functions with a '+' in its name to be skipped when processing call instructions. With this patchkit it should be possible to collect a perf.data file on a ARM machine and then annotate it on a x86 workstation and have those ARM kludges used. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2fi3sy7q3sssdi7m7cbe07gy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf annotate: Start supporting cross arch annotationArnaldo Carvalho de Melo2016-11-175-23/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a 'struct arch', where arch specific stuff will live, starting with objdump's choice of comment delimitation character, that is '#' in x86 while a ';' in arm. This has some bits and pieces from a patch submitted by Ravi. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-f337tzjjcl8vtapgvjxmhrbx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Show branch info in callchain entry for browser modeJin Yao2016-11-141-2/+18
| | | | | | | | | | | | | | | | | | If the branch is 100% predicted then the "predicted" is hidden. Similarly, if there is no branch tsx abort, the "abort" is hidden. There is only cycles shown (cycle is supported on skylake platform, older platform would be 0). If no iterations, the "iterations" is hidden. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-6-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Show branch info in callchain entry for stdio modeJin Yao2016-11-141-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the branch is 100% predicted then the "predicted" is hidden. Similarly, if there is no branch tsx abort, the "abort" is hidden. There is only cycles shown (cycle is supported on skylake platform, older platform would be 0). If no iterations, the "iterations" is hidden. For example: |--29.93%--main div.c:39 (predicted:50.6%, cycles:1, iterations:18) | main div.c:44 (predicted:50.6%, cycles:1) | | | --22.69%--main div.c:42 (cycles:2, iterations:17) | compute_flag div.c:28 (cycles:2) | | | --10.52%--compute_flag div.c:27 (cycles:1) | rand rand.c:28 (cycles:1) | rand rand.c:28 (cycles:1) | __random random.c:298 (cycles:1) | __random random.c:297 (cycles:1) | __random random.c:295 (cycles:1) | __random random.c:295 (cycles:1) | __random random.c:295 (cycles:1) | __random random.c:295 (cycles:6) Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-5-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Calculate and return the branch flag countingJin Yao2016-11-142-1/+202
| | | | | | | | | | | | | | | | | | | | | | Create some branch counters in per callchain list entry. Each counter is for a branch flag. For example, predicted_count counts all the *predicted* branches. The counters get updated by processing the callchain cursor nodes. It also provides functions to retrieve or print the values of counters in callchain list. Besides the counting for branch flags, it also counts and returns the average number of iterations. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-4-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Create a symbol_conf flag for showing branch flag countingJin Yao2016-11-142-0/+4
| | | | | | | | | | | | | | | | Create a new flag show_branchflag_count in symbol_conf. The flag is used to control if showing the branch flag counting information. The flag depends on if the perf.data has branch data and if user chooses the "branch-history" option in perf report command line. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/1477876794-30749-3-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf report: Add branch flag to callchain cursor nodeJin Yao2016-11-143-18/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the branch ip has been added to call stack for easier browsing, this patch adds more branch information. For example, add a flag to indicate if this ip is a branch, and also add with the branch flag. Then we can know if the cursor node represents a branch and know what the branch flag it has. The branch history code has a loop detection pass that removes loops. It would be nice for knowing how many loops were removed then in next steps, we can compute out the average number of iterations. For example: Before remove_loops(), entry0: from = 0x100, to = 0x200 entry1: from = 0x300, to = 0x250 entry2: from = 0x300, to = 0x250 entry3: from = 0x300, to = 0x250 entry4: from = 0x700, to = 0x800 After remove_loops() entry0: from = 0x100, to = 0x200 entry1: from = 0x300, to = 0x250 entry2: from = 0x700, to = 0x800 The original entry2 and entry3 are removed. So the number of iterations (from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1). iterations = removed number + 1; average iteractions = Sum(iteractions) / number of samples This formula ignores other cases, for example, iterations cross multiple buffers and one buffer contains 2+ loops. Because in practice, it's good enough. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Linux-kernel@vger.kernel.org Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com [ Renamed 'iter' to 'nr_loop_iter' for clarity ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf config: Mark where are config items from (user or system)Taeung Song2016-11-143-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | To write config items to a particular config file, we should know where is each config section and item from. Current setting functionality of perf-config use autogenerating way by overwriting collected config items to a config file. For example, when collecting config items from user and system config files (i.e. ~/.perfconfig and $(sysconf)/perfconfig), perf_config_set can contain both user and system config items. So we should know where each value is from to avoid merging user and system config items on user config file. Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Nambong Ha <over3025@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wookje Kwon <aweee0@gmail.com> Link: http://lkml.kernel.org/r/1478241862-31230-7-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf config: Add support setting variables in a config fileTaeung Song2016-11-144-7/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add setting feature that can add config variables with their values to a config file (i.e. user or system config file) or modify config key-value pairs in a config file. For the syntax examples: perf config [<file-option>] [section.name[=value] ...] e.g. You can set the ui.show-headers to false with # perf config ui.show-headers=false If you want to add or modify several config items, you can do like # perf config annotate.show_nr_jumps=false kmem.default=slab Committer notes: Testing it: $ perf config -l top.children=true report.children=false $ $ perf config top.children=false $ perf config -l top.children=false report.children=false $ $ perf config kmem.default=slab $ perf config -l top.children=false report.children=false kmem.default=slab $ Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Nambong Ha <over3025@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wookje Kwon <aweee0@gmail.com> Link: http://lkml.kernel.org/r/1478241862-31230-5-git-send-email-treeze.taeung@gmail.com [ Combined patch with docs update with this one ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf config: Validate config variable arguments before trying use themTaeung Song2016-11-141-4/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | You can show the values for several config items as below: # perf config report.queue-size call-graph.record-mode but it is necessary to more precisely check arguments, before passing them to show_spec_config(). This validation function would be also used when parsing config key-value pairs arguments in the near future. Committer notes: Testing it: $ perf config bla. The config variable does not contain a variable name: bla. $ perf config .bla The config variable does not contain a section name: .bla $ perf config bla.bla $ Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Nambong Ha <over3025@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wookje Kwon <aweee0@gmail.com> Link: http://lkml.kernel.org/r/1478241862-31230-4-git-send-email-treeze.taeung@gmail.com [ Fix some spelling errors ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf config: Add support for getting config key-value pairsTaeung Song2016-11-142-3/+55
| | | | | | | | | | | | | | | | | | | | | | | Add a functionality getting specific config key-value pairs. For the syntax examples, perf config [<file-option>] [section.name ...] e.g. To query config items 'report.queue-size' and 'report.children', do # perf config report.queue-size report.children Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Nambong Ha <over3025@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Wookje Kwon <aweee0@gmail.com> Link: http://lkml.kernel.org/r/1478241862-31230-2-git-send-email-treeze.taeung@gmail.com [ Combined patch with docs update with this one ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf kvmti: Remove unused Makefile fileJiri Olsa2016-11-141-89/+0
| | | | | | | | | | | | | | | | Now when jvmti compilation is plugged into Makefile.perf, there's no need for this makefile. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: William Cohen <wcohen@redhat.com> Link: http://lkml.kernel.org/r/20161112121016.GA17194@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf jvmti: Plug compilation into perf buildJiri Olsa2016-11-144-2/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compile jvmti agent as part of the perf build. The agent library is called libperf-jvmti.so and is installed in default place together with other files: $ make libperf-jvmti.so BUILD: Doing 'make -j4' parallel build ... CC jvmti/libjvmti.o CC jvmti/jvmti_agent.o LD jvmti/jvmti-in.o LINK libperf-jvmti.so $ make DESTDIR=/tmp/krava/ install-bin ... $ find /tmp/krava/ | grep libperf /tmp/krava/lib64/libperf-jvmti.so /tmp/krava/lib64/libperf-gtk.so Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: William Cohen <wcohen@redhat.com> Link: http://lkml.kernel.org/r/1478093749-5602-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools build: Add jvmti feature detection supportJiri Olsa2016-11-142-1/+18
| | | | | | | | | | | | | | | | Adding support to detect jvmti support. It is not plugged into the FEATURE_TESTS machinery, because it's quite rare and will be used separately from perf via feature_check call. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: William Cohen <wcohen@redhat.com> Link: http://lkml.kernel.org/r/1478093749-5602-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools build: Add CFLAGS_REMOVE_* supportJiri Olsa2016-11-142-3/+7
| | | | | | | | | | | | | | | | | | | | | | Adding support to remove options from final CFLAGS for both object file and build target. It's now possible to remove CFLAGS options like: CFLAGS_REMOVE_krava.o += -Wstrict-prototypes Committer notes: This comes from the kernel's kbuild infrastructure, the subset that is supported in tools/ is being documented at tools/build/Documentation/Build.txt. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: William Cohen <wcohen@redhat.com> Link: http://lkml.kernel.org/r/1478093749-5602-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf intel-pt: Update documentation about context switch eventsArnaldo Carvalho de Melo2016-11-111-2/+17
| | | | | | | | | | | | | | Since the unprivileged sched switch event was added in perf, PT doesn't need need perf_event_paranoid=-1 anymore for per cpu decoding. Add a note stating that that is only needed for kernels < 4.2. Reported-by: Andi Kleen <ak@linux.intel.com> Report-Link: http://lkml.kernel.org/r/http://lkml.kernel.org/n/tip-x2ybghpqxxn3zu0m8o7qi42r@git.kernel.org Acked-by: Adrian Hunter <adrian.hunter@intel.com> Fixes: 45ac1403f564 ("perf: Add PERF_RECORD_SWITCH to indicate context switches") Link: http://lkml.kernel.org/n/tip-x2ybghpqxxn3zu0m8o7qi42r@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf callchain: Fixup help/config for no-unwindingRabin Vincent2016-11-072-6/+0
| | | | | | | | | | | | | | Since 841e3558b2d ("perf callchain: Recording 'dwarf' callchains do not need DWARF unwinding support"), --call-graph dwarf is allowed in 'perf record' even without unwind support. A couple of other places don't reflect this yet though: the help text should list dwarf as a valid record mode and the dump_size config should be respected too. Signed-off-by: Rabin Vincent <rabinv@axis.com> Cc: He Kuang <hekuang@huawei.com> Fixes: 841e3558b2de ("perf callchain: Recording 'dwarf' callchains do not need DWARF unwinding support") Link: http://lkml.kernel.org/r/1470837148-7642-1-git-send-email-rabin.vincent@axis.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add missing object file to the python binding linkage listArnaldo Carvalho de Melo2016-10-282-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ac12f6764c50 ("perf tools: Implement branch_type event parameter") we started using the parse_branch_str() function from one of the files used in the python binding, which caused this entry in 'perf test' to fail: # perf test -v python 16: Try 'import perf' in python, checking link problems : --- start --- test child forked, pid 16667 Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf/python/perf.so: undefined symbol: parse_branch_str test child finished with -1 ---- end ---- Try 'import perf' in python, checking link problems: FAILED! # I must've commited some mistake when running 'perf test' to send the pull request for the perf-core-for-mingo-20161024 tag, to have let this regression to pass, sigh. Just add tools/perf/util/parse-branch-options.c and switch from using ui__warning(), that is not available in the python binding, use pr_warning() instead, which is good enough for this case. Now: # perf test python 16: Try 'import perf' in python, checking link problems : Ok # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Fixes: ac12f6764c50 ("perf tools: Implement branch_type event parameter") Link: http://lkml.kernel.org/n/tip-9kn1ct1cx9ppwqlmzl6z0xhs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf scripting: Don't die if scripting can't be setup, disable itArnaldo Carvalho de Melo2016-10-281-18/+15
| | | | | | | | | | | | Removing one more set of die() calls. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-6pyil685m5i2tugg56gcy0tg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf scripting: Avoid leaking the scripting_context variableArnaldo Carvalho de Melo2016-10-281-2/+4
| | | | | | | | | | | | | | | Both register_perl_scripting() and register_python_scripting() allocate this variable, fix it by checking if it already was. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 7e4b21b84c43 ("perf/scripts: Add Python scripting engine") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Update x86's syscall_64.tbl, adding pkey_(alloc,free,mprotect)Arnaldo Carvalho de Melo2016-10-281-0/+3
| | | | | | | | | | | | | | | | | Introduced in commit f9afc6197e9b ("x86: Wire up protection keys system calls") This will make 'perf trace' aware of them on x86_64. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-s1ta2ttv2xacecqogmd3a9p1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools: Update asm-generic/mman-common.h copy from the kernelArnaldo Carvalho de Melo2016-10-281-0/+5
| | | | | | | | | | | | | | | | | | | | To get the defines introduced in the commit e8c24d3a23a4 ("x86/pkeys: Allocation/free syscalls") Silencing this perf build warning: Warning: tools/include/uapi/asm-generic/mman-common.h differs from kernel Need to change 'perf trace' to beautify those syscalls, as soon as booting with a kernel with it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-yev9rexu02cl7cjeozzmrl9t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench mem: Ignore export.h related changes to mem{cpy,set}.SArnaldo Carvalho de Melo2016-10-281-2/+2
| | | | | | | | | | | | | | | | | | | | | Ignore export.h and EXPORT_SYMBOL in: 784d5699eddc ("x86: move exports to actual definitions") We're not dragging this stuff, not useful in tools/ This silences the following warnings while building perf: Warning: tools/arch/x86/lib/memcpy_64.S differs from kernel Warning: tools/arch/x86/lib/memset_64.S differs from kernel Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-h9vw3pe0fq79zmyqsfr0s0mo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf list: Support matching by topicAndi Kleen2016-10-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support in perf list topic to only show events belonging to a specific vendor events topic. For example the following works now: % perf list frontend List of pre-defined events (to be used in -e): stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-frontend OR cpu/stalled-cycles-frontend/ [Kernel PMU event] frontend: dsb2mite_switches.count [Decode Stream Buffer (DSB)-to-MITE switches] dsb2mite_switches.penalty_cycles [Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles] dsb_fill.exceed_dsb_lines [Cycles when Decode Stream Buffer (DSB) fill encounter more than 3 Decode Stream Buffer (DSB) lines] icache.hit [Number of Instruction Cache, Streaming Buffer and Victim Cache Reads. both cacheable and noncacheable, including UC fetches] ... Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1476902724-9586-2-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Introduce timestamp__scnprintf_usec()Namhyung Kim2016-10-284-7/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joonwoo reported that there's a mismatch between timestamps in script and sched commands. This was because of difference in printing the timestamp. Factor out the code and share it so that they can be in sync. Also I found that sched map has similar problem, fix it too. Committer notes: Fixed the max_lat_at bug introduced by Namhyung's original patch, as pointed out by Joonwoo, and made it a function following the scnprintf() model, i.e. returning the number of bytes formatted, and receiving as the first parameter the object from where the data to the formatting is obtained, renaming it from: char *timestamp_in_usec(char *bf, size_t size, u64 timestamp) to int timestamp__scnprintf_usec(u64 timestamp, char *bf, size_t size) Reported-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161024020246.14928-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sched map: Always show task comm with -vNamhyung Kim2016-10-251-1/+1
| | | | | | | | | | | | | | | I'd like to see the name of tasks with perf sched map, but it only shows name of new tasks and then use short names after all. This is not good for long running tasks since it's hard for users to track the short names. This patch makes it show the names (except the idle task) when -v option is used. Probably we may make it as default behavior. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161024020246.14928-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sched map: Apply cpu color when there's an activityNamhyung Kim2016-10-251-1/+1
| | | | | | | | | | | | | Applying cpu color always doesn't help readability IMHO. Instead it might be better to applying the color when there's an activity on those CPUs. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161024020246.14928-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf sched: Make common options cascadingNamhyung Kim2016-10-251-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -i and -v options can be used in subcommands so enable cascading the sched_options. This fixes the following inconvenience in 'perf sched': $ perf sched -i perf.data.sched map ... (it works well) ... $ perf sched map -i perf.data.sched Error: unknown switch `i' Usage: perf sched map [<options>] --color-cpus <cpus> highlight given CPUs in map --color-pids <pids> highlight given pids in map --compact map output in compact mode --cpus <cpus> display given CPUs in map With this patch, the second command line works with the perf.data.sched data file. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161024030003.28534-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools lib subcmd: Suppport cascading optionsNamhyung Kim2016-10-252-0/+16
| | | | | | | | | | | | | | | | | | Sometimes subcommand have common options and it can only handled in the upper level command unless it duplicates the options. This patch adds a parent field and fallback to the parent if the given argument was not found in the current options. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161024030003.28534-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hist browser: Fix hierarchy column countsNamhyung Kim2016-10-251-1/+14
| | | | | | | | | | | | | | | | | | | | The perf report/top on TUI supports horizontal scrolling using LEFT and RIGHT keys. But it calculate the number of columns incorrectly when hierarchy mode is enabled so that keep pressing RIGHT key can make the output disappeared. In the hierarchy mode, all sort keys are collapsed into a single column, so it needs to be applied when calculating column numbers. Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161024162110.17918-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench futex: Sanitize numeric parametersDavidlohr Bueso2016-10-256-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This gets rid of oddities such as: perf bench futex hash -t -4 perf: calloc: Cannot allocate memory Runtime (and many more) are equally busted, i.e. run for bogus amounts of time. Just use the abs, instead of, for example errorring out. Committer note: After the patch: $ perf bench futex hash -t -4 # Running 'futex/hash' benchmark: Run summary [PID 10178]: 4 threads, each operating on 1024 [private] futexes for 10 secs. [thread 0] futexes: 0x34f9fa0 ... 0x34faf9c [ 4702208 ops/sec ] [thread 1] futexes: 0x34fb140 ... 0x34fc13c [ 4707020 ops/sec ] [thread 2] futexes: 0x34fc2e0 ... 0x34fd2dc [ 4711526 ops/sec ] [thread 3] futexes: 0x34fd480 ... 0x34fe47c [ 4709683 ops/sec ] Averaged 4707609 operations/sec (+- 0.04%), total secs = 10 $ Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1477342613-9938-3-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench futex: Avoid worker cacheline bouncingDavidlohr Bueso2016-10-252-7/+8
| | | | | | | | | | | | | | | | | | | | | | Sebastian noted that overhead for worker thread ops (throughput) accounting was producing 'perf' to appear in the profiles, consuming a non-trivial (i.e. 13%) amount of CPU. This is due to cacheline bouncing due to the increment of w->ops. We can easily fix this by just working on a local copy and updating the actual worker once done running, and ready to show the program summary. There is no danger of the worker being concurrent, so we can trust that no stale value is being seen by another thread. This also gets rid of the unnecessary cache alignment hack; its not worth it. Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/1477342613-9938-2-git-send-email-dave@stgolabs.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf coresight: Removing miscellaneous debug outputMathieu Poirier2016-10-241-2/+0
| | | | | | | | | | | Printing the full path of the selected link is obviously not needed, hence removing. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1476913323-6836-1-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf list: Make vendor event matching case insensitiveAndi Kleen2016-10-243-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | Make the 'perf list' glob matching for vendor events case insensitive. This allows to use the upper case vendor events with perf list too. Now the following works: % perf list LONGEST_LAT ... cache: longest_lat_cache.miss [Core-originated cacheable demand requests missed LLC] longest_lat_cache.reference [Core-originated cacheable demand requests that refer to LLC] Signed-off-by: Andi Kleen <ak@linux.intel.com> Suggested-by: Ingo Molnar <mingo@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1476899402-31460-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf trace: Use the syscall raw_syscalls:sys_enter timestampArnaldo Carvalho de Melo2016-10-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of the one when another syscall takes place while another is being processed (in another CPU, but we show it serialized, so need to "interrupt" the other), and also when finally showing the sys_enter + sys_exit + duration, where we were showing the sample->time for the sys_exit, duh. Before: # perf trace sleep 1 <SNIP> 0.373 ( 0.001 ms): close(fd: 3 ) = 0 1000.626 (1000.211 ms): nanosleep(rqtp: 0x7ffd6ddddfb0) = 0 1000.653 ( 0.003 ms): close(fd: 1 ) = 0 1000.657 ( 0.002 ms): close(fd: 2 ) = 0 1000.667 ( 0.000 ms): exit_group( ) # After: # perf trace sleep 1 <SNIP> 0.336 ( 0.001 ms): close(fd: 3 ) = 0 0.373 (1000.086 ms): nanosleep(rqtp: 0x7ffe303e9550) = 0 1000.481 ( 0.002 ms): close(fd: 1 ) = 0 1000.485 ( 0.001 ms): close(fd: 2 ) = 0 1000.494 ( 0.000 ms): exit_group( ) [root@jouet linux]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ecbzgmu2ni6glc6zkw8p1zmx@git.kernel.org Fixes: 752fde44fd1c ("perf trace: Support interrupted syscalls") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf trace: Remove thread_trace->exit_timeArnaldo Carvalho de Melo2016-10-241-3/+0
| | | | | | | | | | | | | Not used at all, we need just the entry_time to calculate the syscall duration. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-js6r09zdwlzecvaei7t4l3vd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench futex: Cache align the worker structSebastian Andrzej Siewior2016-10-241-1/+4
| | | | | | | | | | | | | | | | It popped up in perf testing that the worker consumes some amount of CPU. It boils down to the increment of `ops` which causes cache line bouncing between the individual threads. This patch aligns the struct by 256 bytes to ensure that not a cache line is shared among CPUs. 128 byte is the x86 worst case and grep says that L1_CACHE_SHIFT is set to 8 on s390. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161016190803.3392-1-bigeasy@linutronix.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Use normal error reporting when processing PERF_RECORD_READ eventsArnaldo Carvalho de Melo2016-10-243-27/+70
| | | | | | | | | | | | | | | | We already have handling for errors when processing PERF_RECORD_ events, so instead of calling die() when not being able to alloc, propagate the error, so that the normal UI exit sequence can take place, the user be warned and possibly the terminal be properly reset to a sane mode. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-r90je3c009a125dvs3525yge@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Normalize sq_quote_argv() error reportingArnaldo Carvalho de Melo2016-10-241-1/+1
| | | | | | | | | | | | | | | It already returns whatever strbuf_(grow|addch)() returns in case of failure, so just return -ENOSPC in the only case where it was die()ing. When it returns, its only caller will call die() anyway, so no need to be so eager, die later. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-as05b7mbogprlwi8iarwns8e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench mem: Move boilerplate memory allocation to the infrastructureArnaldo Carvalho de Melo2016-10-241-47/+30
| | | | | | | | | | | | | | Instead of having all tests perform alloc/free, do it in the code that calls the do_cycles() and do_gettimeofday() functions. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-lywj4mbdb1m9x1z9asivwuuy@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf trace: Implement --delayAlexis Berlemont2016-10-242-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the perf wiki todo-list[1], there is an entry regarding initial-delay and 'perf trace'; the following small patch tries to fulfill this point. It has been generated against the branch tip/perf/core. It has only been implemented in the "trace__run" case. Ex.: $ sudo strace -- ./perf trace --delay 5 sleep 1 2>&1 ... fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 ioctl(7, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0 ioctl(11, PERF_EVENT_IOC_SET_OUTPUT, 0x7) = 0 fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 ioctl(11, PERF_EVENT_IOC_ID, 0x7ffc8fd35718) = 0 write(6, "\0", 1) = 1 close(6) = 0 nanosleep({0, 5000000}, NULL) = 0 # DELAY OF 5 MS BEFORE ENABLING THE EVENTS ioctl(3, PERF_EVENT_IOC_ENABLE, 0) = 0 ioctl(4, PERF_EVENT_IOC_ENABLE, 0) = 0 ioctl(5, PERF_EVENT_IOC_ENABLE, 0) = 0 ioctl(7, PERF_EVENT_IOC_ENABLE, 0) = 0 ... [1]: https://perf.wiki.kernel.org/index.php/Todo Signed-off-by: Alexis Berlemont <alexis.berlemont@gmail.com> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161010054328.4028-2-alexis.berlemont@gmail.com [ Add entry to the manpage, cut'n'pasted from stat's and record's ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf hists browser: Dynamically change verbosity levelAlexis Berlemont2016-10-242-6/+16
| | | | | | | | | | | | | | | | | | | | Here is a small patch which tries to fulfill a point in the perf todo list: * Make pressing 'V' multiple times to go on cycling thru various verbosity levels in 'perf top', so that info that is present in 'perf top -v' can be obtained without having to restart the tool (acme). After a small grep in the code, the max verbosity level seems 3; so, we cycle at 4; I did not dare define a MAX_VERBOSE_LEVEL constant. Signed-off-by: Alexis Berlemont <alexis.berlemont@gmail.com> Suggested-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161012214823.14324-2-alexis.berlemont@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix typo "No enough" to "Not enough"Alexander Alemayhu2016-10-244-10/+10
| | | | | | | | | | | The latter version occurs much more when running git grep. Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20161013161811.4939-1-alexander@alemayhu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf pmu: Only print Using CPUID message onceAndi Kleen2016-10-241-1/+5
| | | | | | | | | | | | | With uncore event aliases which are duplicated over multiple PMUs the "Using CPUID" message with -v could be printed many times. Only print it once. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1476393332-20732-3-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf jit: Add jitdump format specification documentStephane Eranian2016-10-241-0/+170
| | | | | | | | | | | | | | This patch adds a formal specification of the jitdump format. The goal is to help jit runtime developers implement the jitdump support without having to read the jvmti code. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-10-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf jit: Check JITHEADER_VERSIONStefano Sanfilippo2016-10-241-0/+6
| | | | | | | | | | | | | | | Check the version number when opening a jitdump file. Accept older versions, but not newer ones. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-9-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf jit: Generate .eh_frame/.eh_frame_hdr in DSOStefano Sanfilippo2016-10-243-7/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the jit_buf_desc contains unwinding information, it is emitted as eh_frame unwinding sections in the DSOs generated by perf inject. The unwinding information is required to unwind of JITed code which do not maintain the frame pointer register during function calls. It can be emitted by V8 / Chromium when the --perf_prof_unwinding_info is passed to V8. The eh_frame and eh_frame_hdr sections are emitted immediately after the .text. The .eh_frame is aligned at a 8-byte boundary, and .eh_frame_hdr at a 4-byte one. Since size of the .eh_frame is required to be a multiple of the word size, which means there will never be additional padding between it and the .eh_frame_hdr on machines where the word size is 4 or 8 bytes. However, additional padding might be inserted between .text and .eh_frame to reach the correct alignment, which will always be 8 bytes, also on 32bit machines. The reasoning behind this choice is that 4 extra bytes of padding worst case are not a large cost for the advantage of removing word-size dependent offset calculations when emitting the jitdump. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-8-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf jit: Add unwinding supportStefano Sanfilippo2016-10-242-3/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This record is intended to provide unwinding information in the eh_frame format. This is required to unwind JITed code which does not maintain the frame pointer register during function calls. The eh_frame unwinding information can be emitted by V8 / Chromium when the --perf_prof_unwinding_info is passed. A record of type jr_code_unwinding_info comes before the jr_code_load it referred to and contains both the .eh_frame and .eh_frame_hdr. The fields in the header have the following meaning: * unwinding_size: size of the eh_frame and eh_frame_hdr, necessary for distinguishing the content from the padding. * eh_frame_hdr_size: as the name says. * mapped_size: size of the payload that was in memory at runtime. typically unwinding_size if the .eh_frame_hdr and .eh_frame were mapped, or 0 if they weren't. It should always be the former case, since the .eh_frame is guaranteed to be mapped in memory. However, certain JITs might want to inject an .eh_frame_hdr with an empty LUT to trigger fp-based unwinding fallback in libunwind. The only part of the .eh_frame_hdr that libunwind reads from remote memory is the LUT, and since there is none, mapping the unwinding info in memory is not necessary, and 0 in this field signifies that it wasn't. This practical hack allows to save bytes in code memory for those JIT compilers that might or might not maintain a valid frame pointer. The payload that follows is assumed to contain first the .eh_frame and then the .eh_header_hdr, with no padding between the two. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-7-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf jit: Do not assume pgoff is zeroStefano Sanfilippo2016-10-241-2/+2
| | | | | | | | | | | | | | | | | | When calculating .eh_frame_hdr base and LUT offsets do not always assume that pgoff is zero. The assumption is false for DSOs built from the jitdump by perf inject, because the ELF header did not exist in memory at sampling time. Signed-off-by: Stefano Sanfilippo <ssanfilippo@chromium.org> Signed-off-by: Ross McIlroy <rmcilroy@chromium.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Anton Blanchard <anton@ozlabs.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1476356383-30100-6-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
OpenPOWER on IntegriCloud