op-kernel-dev - Development kernel branch for OpenPOWER systems

diff options

author	Waiman Long <Waiman.Long@hp.com>	2014-09-30 13:36:15 -0400
committer	Arnaldo Carvalho de Melo <acme@redhat.com>	2014-10-01 14:39:57 -0300
commit	4598a0a6d22fadfb7b37f2b44ee7fdcb24632fcf (patch)
tree	fa1e74148d31382b71588a7fbbc679f165e4a6cd /tools/perf/util/hist.c
parent	8fa7d87f91479f7124142ca4ad93a37b80f8c1c0 (diff)
download	op-kernel-dev-4598a0a6d22fadfb7b37f2b44ee7fdcb24632fcf.zip op-kernel-dev-4598a0a6d22fadfb7b37f2b44ee7fdcb24632fcf.tar.gz

perf symbols: Improve DSO long names lookup speed with rbtree

With workload that spawns and destroys many threads and processes, it was found that perf-mem could took a long time to post-process the perf data after the target workload had completed its operation. The performance bottleneck was found to be the lookup and insertion of the new DSO structures (thousands of them in this case). In a dual-socket Ivy-Bridge E7-4890 v2 machine (30-core, 60-thread), the perf profile below shows what perf was doing after the profiled AIM7 shared workload completed: - 83.94% perf libc-2.11.3.so [.] __strcmp_sse42 - __strcmp_sse42 - 99.82% map__new machine__process_mmap_event perf_session_deliver_event perf_session__process_event __perf_session__process_events cmd_record cmd_mem run_builtin main __libc_start_main - 13.17% perf perf [.] __dsos__findnew __dsos__findnew map__new machine__process_mmap_event perf_session_deliver_event perf_session__process_event __perf_session__process_events cmd_record cmd_mem run_builtin main __libc_start_main So about 97% of CPU times were spent in the map__new() function trying to insert new DSO entry into the DSO linked list. The whole post-processing step took about 9 minutes. The DSO structures are currently searched linearly. So the total processing time will be proportional to n^2. To overcome this performance problem, the DSO code is modified to also put the DSO structures in a RB tree sorted by its long name in additional to being in a simple linked list. With this change, the processing time will become proportional to n*log(n) which will be much quicker for large n. However, the short name will still be searched using the old linear searching method. With that patch in place, the same perf-mem post-processing step took less than 30 seconds to complete. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Douglas Hatch <doug.hatch@hp.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Link: http://lkml.kernel.org/r/1412098575-27863-3-git-send-email-Waiman.Long@hp.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Diffstat (limited to 'tools/perf/util/hist.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: