| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Useful for when people want to try some version of the perf tools and don't
wants to download the kernel tarball.
Here is a session using this new target:
[root@emilia linux-2.6-tip]# make help | grep -i perf
perf-tar-src-pkg - Build perf-2.6.35-rc1.tar source tarball
perf-targz-src-pkg - Build perf-2.6.35-rc1.tar.gz source tarball
perf-tarbz2-src-pkg - Build perf-2.6.35-rc1.tar.bz2 source tarball
[root@emilia linux-2.6-tip]# make perf-tarbz2-src-pkg
TAR
[root@emilia linux-2.6-tip]# ls -la perf-2.6.35-rc1.tar.bz2
-rw-r--r-- 1 root root 295731 May 31 11:18 perf-2.6.35-rc1.tar.bz2
[root@emilia linux-2.6-tip]# tar xf perf-2.6.35-rc1.tar.bz2
[root@emilia linux-2.6-tip]# cd perf-2.6.35-rc1
[root@emilia perf-2.6.35-rc1]# ls
arch HEAD include lib tools
[root@emilia perf-2.6.35-rc1]# cd tools/perf
[root@emilia perf]# make -j9 2>&1 | tail
CC arch/x86/util/dwarf-regs.o
CC util/probe-finder.o
CC util/newt.o
CC util/scripting-engines/trace-event-perl.o
CC scripts/perl/Perf-Trace-Util/Context.o
CC perf.o
CC builtin-help.o
AR libperf.a
LINK perf
rm .perf.dev.null
[root@emilia perf]# ./perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.262 MB perf.data (~11457 samples) ]
[root@emilia perf]# ./perf report | head -12
# Events: 6K cycles
#
# Overhead Command Shared Object Symbol
# ........ ............... .................. ......
#
4.73% perf [kernel.kallsyms] [k] format_decode
4.49% perf libc-2.12.so [.] _IO_file_underflow_internal
4.38% init [kernel.kallsyms] [k] mwait_idle
3.29% perf [kernel.kallsyms] [k] vsnprintf
2.38% init [kernel.kallsyms] [k] sched_clock_local
2.35% init [kernel.kallsyms] [k] apic_timer_interrupt
1.86% sirq-timer/5 [kernel.kallsyms] [k] find_busiest_group
[root@emilia perf]#
Acked-by: Michal Marek <mmarek@suse.cz>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20100528185357.GA28009@ghostprotocols.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a -C option to stat, record, top to designate a list of CPUs to
monitor. CPUs can be specified as a comma-separated list or ranges, no space
allowed.
Examples:
$ perf record -a -C0-1,4-7 sleep 1
$ perf top -C0-4
$ perf stat -a -C1,2,3,4 sleep 1
With perf record in per-thread mode with inherit mode on, samples are collected
only when the thread runs on the designated CPUs.
The -C option does not turn on system-wide mode automatically.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4bff9496.d345d80a.41fe.7b00@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is useful to know on which CPU a sample was captured on.
The information is captured with perf record -R but it was
not printed out by perf report -D. This patch adds this.
When -R is not used, cpu is set to -1to indicate that
the CPU is unknown (it is not captured).
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <4bff964c.e88cd80a.3106.7d31@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|\
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We need to set the long name to the name specified via, for instance,
'perf annotate --vmlinux /path/to/vmlinux', if not it will remain as
'[kernel.kallsyms]' and that will make annotate fail when passing this
as the vmlinux name in the call to objdump.
The way this is setup grew unwieldly and dso__load_vmlinux is the
function that should allocate space for the long name, with callers not
assuming that filenames should be allocated somehow by then (strdup,
dso__build_id_filename, etc).
For now this is the minimalistic patch, a proper fix for .36 will be
made.
Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20100604003900.GD10469@ghostprotocols.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Frederic reported that because swevents handling doesn't disable IRQs
anymore, we can get a recursion of perf_adjust_period(), once from
overflow handling and once from the tick.
If both call ->disable, we get a double hlist_del_rcu() and trigger
a LIST_POISON2 dereference.
Since we don't actually need to stop/start a swevent to re-programm
the hardware (lack of hardware to program), simply nop out these
callbacks for the swevent pmu.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1275557609.27810.35218.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we use plain 'perf buildid-list' we use only what is in the buildid
table in the perf.data header. And those have absolute pathnames because
at 'perf record' time we used __perf_session__process_events and that
doesn't sets up the path shortening code in map__new() that happens if
symbol_conf.full_paths is false, the default.
On the other hand, when we use 'perf buildid-list --with-hits' we
process all the events using perf_session__process_events, adding
entries to the global DSO list _after_ removing the current directory
from the DSO name, for presentation purposes.
Because of that we end up having two entries in the DSO list when
recording events for binaries using relative pathnames.
Fix it minimally by setting symbol_conf.full_paths to true when marking
the DSOs with hits in 'perf buildid-list --with-hits', as used by 'perf
archive'
Right fix longer term is to shorten the path only at presentation time.
Will be done for 2.6.36.
Reported-by: Stephane Eranian <eranian@google.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20100601183837.GC4093@ghostprotocols.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
trace_unhandled() callback does not allow to access event fields, this patch
resolves the problem.
It can also been used as a more pythonic and flexible way for script writters
to demux event types
This will for example greatly simplify pytimechart event demux.
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>,
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <1275340329-2397-1-git-send-email-tardyp@gmail.com>
Signed-off-by: Pierre Tardy <tardyp@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
hist_entry__annotate() runs objdump with -S option so the output may contain
lines of any format. If a line starts with a colon strtoull() returns 0 and
calculated offset will be negative. This causes perf annotate segfaults.
Make sure that strtoull() has parsed at least one digit.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Konstantin Stepanyuk <konstantin.stepanyuk@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When forking the child to be traced, we should check the correct
return value from fork() and not a local variable which is otherwise
unused.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <20100531211818.GA30175@liondog.tnic>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
event__process_task() doesn't propagate the comm copy on clone,
but only on process fork. So we loose all the tid:comm resolution
for tasks that aren't a main process thread.
Progragate the per thread granularity to event__process_task for
pid resolution.
This fixes various unresolved pids in perf sched, especially when
we trace multithread processes. The problem is quickly reproducible
with the messaging benchmark using the multithread mode "-t" :
perf sched record perf bench sched messaging -t
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
perf sched uses event__process_comm(), which means it can resolve
comms from:
- tasks that have exec'ed (kernel comm events)
- tasks that were running when perf record started the actual
recording (synthetized comm events)
But perf sched can't resolve the pids of tasks that were created
after the recording started.
To solve this, we need to inherit the comms on fork events using
event__process_task().
This fixes various unresolved pids in perf sched, easily visible
with:
perf sched record perf bench sched messaging
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we synthetize the existing running tasks though procfs,
we walk through every threads of a process, queuing one comm
events per tid.
But then on report time, event__process_comm() only creates and
sets the comm on a per process granularity. This is the right
thing for comm events that came from the kernel, as they are
only created on exec. Sub-threads then inherit their comm
from fork events. But that doesn't work with our synthetized
comm events taken from procfs informations as the per thread
granularity is done on comm events directly there.
Hence we need event__process_comm() to work with the tid rather
than the pid. It won't change anything for comm events coming
from the kernel but this will fix the synthetized ones.
Before:
$ ./perf report -D | grep COMM | grep firefox
0x2c7b8 [0x18]: PERF_RECORD_COMM: firefox:5297
0x2c7d0 [0x18]: PERF_RECORD_COMM: firefox:5297
0x2c7e8 [0x18]: PERF_RECORD_COMM: firefox:5297
0x2c800 [0x18]: PERF_RECORD_COMM: firefox:5297
0x2c818 [0x18]: PERF_RECORD_COMM: firefox:5297
0x2c830 [0x18]: PERF_RECORD_COMM: firefox:5297
After:
$ ./perf report -D | grep COMM | grep firefox
0x2c7b8 [0x18]: PERF_RECORD_COMM: firefox:5297
0x2c7d0 [0x18]: PERF_RECORD_COMM: firefox:5299
0x2c7e8 [0x18]: PERF_RECORD_COMM: firefox:5300
0x2c800 [0x18]: PERF_RECORD_COMM: firefox:5308
0x2c818 [0x18]: PERF_RECORD_COMM: firefox:5309
0x2c830 [0x18]: PERF_RECORD_COMM: firefox:5312
This fixes various unresolved pid on perf sched.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix blktrace.c kernel-doc warnings:
Warning(kernel/trace/blktrace.c:858): No description found for parameter 'ignore'
Warning(kernel/trace/blktrace.c:890): No description found for parameter 'ignore'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100529114507.c466fc1e.randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a sample size crosses to the next page boundary, the copy
will be made in more than one step. However we forget to advance
the source offset for the next copy, leading to unexpected double
copies that completely mess up the traces.
This fixes various kinds of bad traces that have irrelevant
data inside, as an example:
geany-4979 [001] 5758.077775: sched_switch: prev_comm=! prev_pid=121
prev_prio=0 prev_state=S|D|Z|X|x ==> next_comm= next_pid=7497072
next_prio=0
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274988898-5639-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The transactional API patch between the generic and model-specific
code introduced several important bugs with event scheduling, at
least on X86. If you had pinned events, e.g., watchdog, and were
over-committing the PMU, you would get bogus counts. The bug was
showing up on Intel CPU because events would move around more
often that on AMD. But the problem also existed on AMD, though
harder to expose.
The issues were:
- group_sched_in() was missing a cancel_txn() in the error path
- cpuc->n_added was not properly maintained, leading to missing
actions in hw_perf_enable(), i.e., n_running being 0. You cannot
update n_added until you know the transaction has succeeded. In
case of failed transaction n_added was not adjusted back.
- in case of failed transactions, event_sched_out() was called
and eventually invoked x86_disable_event() to touch the HW reg.
But with transactions, on X86, event_sched_in() does not touch
HW registers, it simply collects events into a list. Thus, you
could end up calling x86_disable_event() on a counter which
did not correspond to the current event when idx != -1.
The patch modifies the generic and X86 code to avoid all those problems.
First, we keep track of the number of events added last. In case the
transaction fails, we substract them from n_added. This approach is
necessary (as opposed to delaying updates to n_added) because not all
event updates use the transaction API, e.g., single events.
Second, we encapsulate the event_sched_in() and event_sched_out() in
group_sched_in() inside the transaction. That makes the operations
symmetrical and you can also detect that you are inside a transaction
and skip the HW reg access by checking cpuc->group_flag.
With this patch, you can now overcommit the PMU even with pinned
system-wide events present and still get valid counts.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274796225.5882.1389.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
|
|
|
|
|
|
|
| |
Steve spotted I forgot to do the destroy under event_mutex.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274451913.1674.1707.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
tracepoint_probe_unregister() does not synchronize against the probe
callbacks, so do that explicitly. This properly serializes the callbacks
and the free of the data used therein.
Also, use this_cpu_ptr() where possible.
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1274438476.1674.1702.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Group siblings don't pin each-other or the parent, so when we destroy
events we must make sure to clean up all cross referencing pointers.
In particular, for destruction of a group leader we must be able to
find all its siblings and remove their reference to it.
This means that detaching an event from its context must not detach it
from the group, otherwise we can end up failing to clear all pointers.
Solve this by clearly separating the attachment to a context and
attachment to a group, and keep the group composed until we destroy
the events.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to move toward separate buffer objects, rework the whole
perf_mmap_data construct to be a more self-sufficient entity, one
with its own lifetime rules.
This greatly sanitizes the whole output redirection code, which
was riddled with bugs and races.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
|
|
| |
.. and thus endeth the merge window.
|
|\
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
SLUB: Allow full duplication of kmalloc array for 390
slub: move kmem_cache_node into it's own cacheline
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 756dee75872a2a764b478e18076360b8a4ec9045 ("SLUB: Get rid of dynamic DMA
kmalloc cache allocation") makes S390 run out of kmalloc caches. Increase the
number of kmalloc caches to a safe size.
Cc: <stable@kernel.org> [ .33 and .34 ]
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch is meant to improve the performance of SLUB by moving the local
kmem_cache_node lock into it's own cacheline separate from kmem_cache.
This is accomplished by simply removing the local_node when NUMA is enabled.
On my system with 2 nodes I saw around a 5% performance increase w/
hackbench times dropping from 6.2 seconds to 5.9 seconds on average. I
suspect the performance gain would increase as the number of nodes
increases, but I do not have the data to currently back that up.
Bugzilla-Reference: http://bugzilla.kernel.org/show_bug.cgi?id=15713
Cc: <stable@kernel.org>
Reported-by: Alex Shi <alex.shi@intel.com>
Tested-by: Alex Shi <alex.shi@intel.com>
Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
mutex: Fix optimistic spinning vs. BKL
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, we can hit a nasty case with optimistic
spinning on mutexes:
CPU A tries to take a mutex, while holding the BKL
CPU B tried to take the BLK while holding the mutex
This looks like a AB-BA scenario but in practice, is
allowed and happens due to the auto-release on
schedule() nature of the BKL.
In that case, the optimistic spinning code can get us
into a situation where instead of going to sleep, A
will spin waiting for B who is spinning waiting for
A, and the only way out of that loop is the
need_resched() test in mutex_spin_on_owner().
This patch fixes it by completely disabling spinning
if we own the BKL. This adds one more detail to the
extensive list of reasons why it's a bad idea for
kernel code to be holding the BKL.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@kernel.org>
LKML-Reference: <20100519054636.GC12389@ozlabs.org>
[ added an unlikely() attribute to the branch ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tui: Fix last use_browser problem related to .perfconfig
perf symbols: Add the build id cache to the vmlinux path
perf tui: Reset use_browser if stdout is not a tty
ring-buffer: Move zeroing out excess in page to ring buffer code
ring-buffer: Reset "real_end" when page is filled
|
| |\ \ \
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently the trace splice code zeros out the excess bytes in the page before
sending it off to userspace.
This is to make sure userspace is not getting anything it should not be
when reading the pages, because the excess data was never initialized
to zero before writing (for perfomance reasons).
But the splice code has no business in doing this work, it should be
done by the ring buffer. With the latest changes for recording lost
events, the splice code gets it wrong anyway.
Move the zeroing out of excess bytes into the ring buffer code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The code to store the "lost events" requires knowing the real end
of the page. Since the 'commit' includes the padding at the end of
a page a "real_end" variable was used to keep track of the end not
including the padding.
If events were lost, the reader can place the count of events in
the padded area if there is enough room.
The bug this patch fixes is that when we fill the page we do not
reset the real_end variable, and if the writer had wrapped a few
times, the real_end would be incorrect.
This patch simply resets the real_end if the page was filled.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When we moved to using ~/.perfconfig to set the value of use_browser,
it changed from a boolean to an int so that the convention used for
use_pager was followed.
That convention is:
-1: unspecified, that is what use_{browser,pager} is initialized
0: Don't use the browser (should be TUI), because was explicitely
set to 0/off/false on ~/.perfconfig [tui] cmd =, or because
we're redirecting the stdout to a file or piping it to some
other command (!isatty()).
1: Use the TUI
Some code was not properly audited and continued testing it as a
boolean, this seems to be the last one.
Reported-by: Frédéric Weisbecker <fweisbec@gmail.com>
Tested-by: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
So that if the kernel DSO has a build id because record inserted it in
the perf.data build id table in the header, or a BUILD_ID event was
inserted in the stream, we first look at the build id cache
($HOME/.debug/).
If we find it there, try to use it, allowing offline annotation in
addition to 'perf report'.
Reported-by: Stephane Eranian <eranian@google.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The newt initialization routines weren't being called because the output
was a file (perf annotate > /tmp/bla) but use_browser was still 1,
because ~/.perfconfig had it as 'on', so, later on newt routines
segfaulted.
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This partially reverts commit 4ec37de89d8c758ee8115e0e64b3f994910789ee
("[IA64] Fix build breakage"), since the commit that made it necessary
got reverted earlier (see commit 35926ff5fba8, 'Revert "cpusets:
randomize node rotor used in cpuset_mem_spread_node()"')
Even if we ever re-introduce this, there is no reason to make
__node_random be some architecture-specific function.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
mm: export generic_pipe_buf_*() to modules
fuse: support splice() reading from fuse device
fuse: allow splice to move pages
mm: export remove_from_page_cache() to modules
mm: export lru_cache_add_*() to modules
fuse: support splice() writing to fuse device
fuse: get page reference for readpages
fuse: use get_user_pages_fast()
fuse: remove unneeded variable
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is needed by fuse device code which wants to create pipe buffers.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Allow userspace filesystem implementation to use splice() to read from
the fuse device.
The userspace filesystem can now transfer data coming from a WRITE
request to an arbitrary file descriptor (regular file, block device or
socket) without having to go through a userspace buffer.
The semantics of using splice() to read messages are:
1) with a single splice() call move the whole message from the fuse
device to a temporary pipe
2) read the header from the pipe and determine the message type
3a) if message is a WRITE then splice data from pipe to destination
3b) else read rest of message to userspace buffer
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When splicing buffers to the fuse device with SPLICE_F_MOVE, try to
move pages from the pipe buffer into the page cache. This allows
populating the fuse filesystem's cache without ever touching the page
contents, i.e. zero copy read capability.
The following steps are performed when trying to move a page into the
page cache:
- buf->ops->confirm() to make sure the new page is uptodate
- buf->ops->steal() to try to remove the new page from it's previous place
- remove_from_page_cache() on the old page
- add_to_page_cache_locked() on the new page
If any of the above steps fail (non fatally) then the code falls back
to copying the page. In particular ->steal() will fail if there are
external references (other than the page cache and the pipe buffer) to
the page.
Also since the remove_from_page_cache() + add_to_page_cache_locked()
are non-atomic it is possible that the page cache is repopulated in
between the two and add_to_page_cache_locked() will fail. This could
be fixed by creating a new atomic replace_page_cache_page() function.
fuse_readpages_end() needed to be reworked so it works even if
page->mapping is NULL for some or all pages which can happen if the
add_to_page_cache_locked() failed.
A number of sanity checks were added to make sure the stolen pages
don't have weird flags set, etc... These could be moved into generic
splice/steal code.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is needed to enable moving pages into the page cache in fuse with
splice(..., SPLICE_F_MOVE).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is needed to enable moving pages into the page cache in fuse with
splice(..., SPLICE_F_MOVE).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Allow userspace filesystem implementation to use splice() to write to
the fuse device. The semantics of using splice() are:
1) buffer the message header and data in a temporary pipe
2) with a *single* splice() call move the message from the temporary pipe
to the fuse device
The READ reply message has the most interesting use for this, since
now the data from an arbitrary file descriptor (which could be a
regular file, a block device or a socket) can be tranferred into the
fuse device without having to go through a userspace buffer. It will
also allow zero copy moving of pages.
One caveat is that the protocol on the fuse device requires the length
of the whole message to be written into the header. But the length of
the data transferred into the temporary pipe may not be known in
advance. The current library implementation works around this by
using vmplice to write the header and modifying the header after
splicing the data into the pipe (error handling omitted):
struct fuse_out_header out;
iov.iov_base = &out;
iov.iov_len = sizeof(struct fuse_out_header);
vmsplice(pip[1], &iov, 1, 0);
len = splice(input_fd, input_offset, pip[1], NULL, len, 0);
/* retrospectively modify the header: */
out.len = len + sizeof(struct fuse_out_header);
splice(pip[0], NULL, fuse_chan_fd(req->ch), NULL, out.len, flags);
This works since vmsplice only saves a pointer to the data, it does
not copy the data itself.
Since pipes are currently limited to 16 pages and messages need to be
spliced atomically, the length of the data is limited to 15 pages (or
60kB for 4k pages).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Acquire a page ref on pages in ->readpages() and release them when the
read has finished. Not acquiring a reference didn't seem to cause any
trouble since the page is locked and will not be kicked out of the
page cache during the read.
However the following patches will want to remove the page from the
cache so a separate ref is needed. Making the reference in req->pages
explicit also makes the code easier to understand.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Replace uses of get_user_pages() with get_user_pages_fast(). It looks
nicer and should be faster in most cases.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
"map" isn't needed any more after: 0bd87182d3ab18 "fuse: fix kunmap in
fuse_ioctl_copy_user"
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig:
kconfig: Hide error output in find command in streamline_config.pl
kconfig: Fix typo in comment in streamline_config.pl
kconfig: Make a variable local in streamline_config.pl
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Finding the list of Makefiles in streamline-config should not report errors.
Also move the "chomp" to the @makefiles array instead of doing it in the
for loop. This is more efficient, and does not make it any less readable
by C programmers.
Signed-off-by: Toralf Foerster <toralf.foerster@gmx.de>
LKML-Reference: <201005262022.02928.toralf.foerster@gmx.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Toralf Foerster <toralf.foerster@gmx.de>
LKML-Reference: <201005281025.52753.toralf.foerster@gmx.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Proper perl requires that local variables should be declared with 'my',
otherwise this may produce errors.
Signed-off-by: Toralf Foerster <toralf.foerster@gmx.de>
LKML-Reference: <201005281025.00358.toralf.foerster@gmx.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (47 commits)
mfd: Rename twl5031 sih modules
mfd: Storage class for timberdale should be before const qualifier
mfd: Remove unneeded and dangerous clearing of clientdata
mfd: New AB8500 driver
gpio: Fix inverted rdc321x gpio data out registers
mfd: Change rdc321x resources flags to IORESOURCE_IO
mfd: Move pcf50633 irq related functions to its own file.
mfd: Use threaded irq for pcf50633
mfd: pcf50633-adc: Fix potential race in pcf50633_adc_sync_read
mfd: Fix pcf50633 bitfield logic in interrupt handler
gpio: rdc321x needs to select MFD_CORE
mfd: Use menuconfig for quicker config editing
ARM: AB3550 board configuration and irq for U300
mfd: AB3550 core driver
mfd: AB3100 register access change to abx500 API
mfd: Renamed ab3100.h to abx500.h
gpio: Add TC35892 GPIO driver
mfd: Add Toshiba's TC35892 MFD core
mfd: Delay to mask tsc irq in max8925
mfd: Remove incorrect wm8350 kfree
...
|