summaryrefslogtreecommitdiffstats
path: root/include/linux/perf_event.h
Commit message (Collapse)AuthorAgeFilesLines
* perf: Fix task context schedulingPeter Zijlstra2011-03-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jiri reported: | | - once an event is created by sys_perf_event_open, task context | is created and it stays even if the event is closed, until the | task is finished ... thats what I see in code and I assume it's | correct | | - when the task opens event, perf_sched_events jump label is | incremented and following callbacks are started from scheduler | | __perf_event_task_sched_in | __perf_event_task_sched_out | | These callback *in/out set/unset cpuctx->task_ctx value to the | task context. | | - close is called on event on CPU 0: | - the task is scheduled on CPU 0 | - __perf_event_task_sched_in is called | - cpuctx->task_ctx is set | - perf_sched_events jump label is decremented and == 0 | - __perf_event_task_sched_out is not called | - cpuctx->task_ctx on CPU 0 stays set | | - exit is called on CPU 1: | - the task is scheduled on CPU 1 | - perf_event_exit_task is called | - task_ctx_sched_out unsets cpuctx->task_ctx on CPU 1 | - put_ctx destroys the context | | - another call of perf_rotate_context on CPU 0 will use invalid | task_ctx pointer, and eventualy panic. | Cure this the simplest possibly way by partially reverting the jump_label optimization for the sched_out case. Reported-and-tested-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@kernel.org> # .37+ LKML-Reference: <1301520405.4859.213.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()Stephane Eranian2011-03-231-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch solves a stale pointer problem in update_cgrp_time_from_cpuctx(). The cpuctx->cgrp was not cleared on all possible event exit paths, including: close() perf_release() perf_release_kernel() list_del_event() This patch fixes list_del_event() to clear cpuctx->cgrp when there are no cgroup events left in the context. [ This second version makes the code compile when CONFIG_CGROUP_PERF is not enabled. We unconditionally define perf_cpu_context->cgrp. ] Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: perfmon2-devel@lists.sf.net Cc: paulus@samba.org Cc: davem@davemloft.net LKML-Reference: <20110323150306.GA1580@quad> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Reorder & optimize perf_event_context to remove alignment padding on ↵Richard Kennedy2011-03-161-1/+1
| | | | | | | | | | | | | 64 bit builds Remove 8 bytes of alignment padding from perf_event_context on 64 bit builds which shrinks its size to 192 bytes allowing it to fit into one fewer cache lines and into a smaller slab. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299512819.2039.5.camel@castor.rsk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Add support for supplementary event registersAndi Kleen2011-03-041-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change logs against Andi's original version: - Extends perf_event_attr:config to config{,1,2} (Peter Zijlstra) - Fixed a major event scheduling issue. There cannot be a ref++ on an event that has already done ref++ once and without calling put_constraint() in between. (Stephane Eranian) - Use thread_cpumask for percore allocation. (Lin Ming) - Use MSR names in the extra reg lists. (Lin Ming) - Remove redundant "c = NULL" in intel_percore_constraints - Fix comment of perf_event_attr::config1 Intel Nehalem/Westmere have a special OFFCORE_RESPONSE event that can be used to monitor any offcore accesses from a core. This is a very useful event for various tunings, and it's also needed to implement the generic LLC-* events correctly. Unfortunately this event requires programming a mask in a separate register. And worse this separate register is per core, not per CPU thread. This patch: - Teaches perf_events that OFFCORE_RESPONSE needs extra parameters. The extra parameters are passed by user space in the perf_event_attr::config1 field. - Adds support to the Intel perf_event core to schedule per core resources. This adds fairly generic infrastructure that can be also used for other per core resources. The basic code has is patterned after the similar AMD northbridge constraints code. Thanks to Stephane Eranian who pointed out some problems in the original version and suggested improvements. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1299119690-13991-2-git-send-email-ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Optimize throttling codePeter Zijlstra2011-02-161-0/+4
| | | | | | | | | | By pre-computing the maximum number of samples per tick we can avoid a multiplication and a conditional since MAX_INTERRUPTS > max_samples_per_tick. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Add cgroup supportStephane Eranian2011-02-161-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This kernel patch adds the ability to filter monitoring based on container groups (cgroups). This is for use in per-cpu mode only. The cgroup to monitor is passed as a file descriptor in the pid argument to the syscall. The file descriptor must be opened to the cgroup name in the cgroup filesystem. For instance, if the cgroup name is foo and cgroupfs is mounted in /cgroup, then the file descriptor is opened to /cgroup/foo. Cgroup mode is activated by passing PERF_FLAG_PID_CGROUP in the flags argument to the syscall. For instance to measure in cgroup foo on CPU1 assuming cgroupfs is mounted under /cgroup: struct perf_event_attr attr; int cgroup_fd, fd; cgroup_fd = open("/cgroup/foo", O_RDONLY); fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP); close(cgroup_fd); Signed-off-by: Stephane Eranian <eranian@google.com> [ added perf_cgroup_{exit,attach} ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Sysfs enumerationPeter Zijlstra2010-12-161-0/+1
| | | | | | | | | | | | | | | | | | Simple sysfs emumeration of the PMUs. Use a "event_source" bus, and add PMU devices using their name. Each PMU device has a type attribute which contrains the value needed for perf_event_attr::type to identify this PMU. This is the minimal stub needed to start using this interface, we'll consider extending the sysfs usage later. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.316982569@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Dynamic pmu typesPeter Zijlstra2010-12-161-1/+4
| | | | | | | | | | | | | | Extend the perf_pmu_register() interface to allow for named and dynamic pmu types. Because we need to support the existing static types we cannot use dynamic types for everything, hence provide a type argument. If we want to enumerate the PMUs they need a name, provide one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.259707703@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'perf/urgent' into perf/coreIngo Molnar2010-12-161-0/+1
|\ | | | | | | | | | | Merge reason: We want to apply a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Fix duplicate events with multiple-pmu vs software eventsPeter Zijlstra2010-12-081-0/+1
| | | | | | | | | | | | | | | | | | | | Because the multi-pmu bits can share contexts between struct pmu instances we could get duplicate events by iterating the pmu list. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | perf events: Make sample_type identity fields available in all PERF_RECORD_ ↵Arnaldo Carvalho de Melo2010-12-041-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | events If perf_event_attr.sample_id_all is set it will add the PERF_SAMPLE_ identity info: TID, TIME, ID, CPU, STREAM_ID As a trailer, so that older perf tools can process new files, just ignoring the extra payload. With this its possible to do further analysis on problems in the event stream, like detecting reordering of MMAP and FORK events, etc. V2: Fixup header size in comm, mmap and task processing, as we have to take into account different sample_types for each matching event, noticed by Thomas Gleixner. Thomas also noticed a problem in v2 where if we didn't had space in the buffer we wouldn't restore the header size. Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf events: Separate the routines handling the PERF_SAMPLE_ identity fieldsArnaldo Carvalho de Melo2010-12-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those will be made available in sample like events like MMAP, EXEC, etc in a followup patch. So precalculate the extra id header space and have a separate routine to fill them up. V2: Thomas noticed that the id header needs to be precalculated at inherit_events too: LKML-Reference: <alpine.LFD.2.00.1012031245220.2653@localhost6.localdomain6> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1291318772-30880-2-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf events: Precalculate the header space for PERF_SAMPLE_ fieldsArnaldo Carvalho de Melo2010-11-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PERF_SAMPLE_{CALLCHAIN,RAW} have variable lenghts per sample, but the others can be precalculated, reducing a bit the per sample cost. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf: Introduce is_sampling_event()Franck Bui-Huu2010-11-261-0/+5
|/ | | | | | | | | and use it when appropriate. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Fix the software context switch counterPeter Zijlstra2010-11-261-14/+15
| | | | | | | | | | | Stephane noticed that because the perf_sw_event() call is inside the perf_event_task_sched_out() call it won't get called unless we have a per-task counter. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Fix inherit vs. context rotation bugThomas Gleixner2010-11-261-0/+1
| | | | | | | | | | | | | | It was found that sometimes children of tasks with inherited events had one extra event. Eventually it turned out to be due to the list rotation no being exclusive with the list iteration in the inheritance code. Cure this by temporarily disabling the rotation while we inherit the events. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_events: Fix time tracking in samplesStephane Eranian2010-11-101-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch corrects time tracking in samples. Without this patch both time_enabled and time_running are bogus when user asks for PERF_SAMPLE_READ. One uses PERF_SAMPLE_READ to sample the values of other counters in each sample. Because of multiplexing, it is necessary to know both time_enabled, time_running to be able to scale counts correctly. In this second version of the patch, we maintain a shadow copy of ctx->time which allows us to compute ctx->time without calling update_context_time() from NMI context. We avoid the issue that update_context_time() must always be called with ctx->lock held. We do not keep shadow copies of the other event timings because if the lead event is overflowing then it is active and thus it's been scheduled in via event_sched_in() in which case neither tstamp_stopped, tstamp_running can be modified. This timing logic only applies to samples when PERF_SAMPLE_READ is used. Note that this patch does not address timing issues related to sampling inheritance between tasks. This will be addressed in a future patch. With this patch, the libpfm4 example task_smpl now reports correct counts (shown on 2.4GHz Core 2): $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 noploop for 5 seconds IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 2,400,000,254 unhalted_core_cycles:u (33) 2,399,273,744 instructions_retired:u (34) 53,340 baclears (35) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* jump_label: Add COND_STMT(), reducer wrapperyPeter Zijlstra2010-10-181-10/+2
| | | | | | | | | | | | | | The use of the JUMP_LABEL() construct ends up creating endless silly wrappers, create a higher level construct to reduce this clutter. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Optimize sw eventsPeter Zijlstra2010-10-181-9/+11
| | | | | | | Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf: Use jump_labels to optimize the scheduler hooksPeter Zijlstra2010-10-181-2/+25
| | | | | | | | | Trades a call + conditional + ret for an unconditional jmp. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101014203625.501657727@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf, hw_breakpoint: Fix crash in hw_breakpoint creationPeter Zijlstra2010-10-181-0/+7
| | | | | | | | | | | | | | | | | | | | hw_breakpoint creation needs to account stuff per-task to ensure there is always sufficient hardware resources to back these things due to ptrace. With the perf per pmu context changes the event initialization no longer has access to the event context, for the simple reason that we need to first find the pmu (result of initialization) before we can find the context. This makes hw_breakpoints unhappy, because it can no longer do per task accounting, cure this by frobbing a task pointer in the event::hw bits for now... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20101014203625.391543667@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* irq_work: Add generic hardirq context callbacksPeter Zijlstra2010-10-181-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge remote branch 'tip/perf/core' into oprofile/coreRobert Richter2010-10-151-55/+100
|\ | | | | | | | | | | Conflicts: arch/arm/oprofile/common.c kernel/perf_event.c
| * perf: Undo the per cpu-context timer stuffPeter Zijlstra2010-09-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the timer per cpu-context timers because of unfortunate nohz interaction. Fixing that would have been somewhat ugly, so go back to driving things from the regular tick. Provide a jiffies interval feature for people who want slower rotations. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20100917093009.519845633@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Complete software pmu groupingPeter Zijlstra2010-09-171-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aside from allowing software events into a !software group, allow adding !software events to pure software groups. Once we've moved the software group and attached the first !software event, the group will no longer be a pure software group and hence no longer be eligible for movement, at which point the straight ctx comparison is correct again. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20100917093009.410784731@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf events: Clean up pid passingMatt Helsley2010-09-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel perf event creation path shouldn't use find_task_by_vpid() because a vpid exists in a specific namespace. find_task_by_vpid() uses current's pid namespace which isn't always the correct namespace to use for the vpid in all the places perf_event_create_kernel_counter() (and thus find_get_context()) is called. The goal is to clean up pid namespace handling and prevent bugs like: https://bugzilla.kernel.org/show_bug.cgi?id=17281 Instead of using pids switch find_get_context() to use task struct pointers directly. The syscall is responsible for resolving the pid to a task struct. This moves the pid namespace resolution into the syscall much like every other syscall that takes pid parameters. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robin Green <greenrd@greenrd.org> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> LKML-Reference: <a134e5e392ab0204961fd1a62c84a222bf5874a9.1284407763.git.matthltc@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Fix up delayed_put_task_struct()Peter Zijlstra2010-09-091-0/+2
| | | | | | | | | | | | | | | | | | I missed a perf_event_ctxp user when converting it to an array. Pull this last user into perf_event.c as well and fix it up. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Provide a separate task context for sweventsPeter Zijlstra2010-09-091-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since software events are always schedulable, mixing them up with hardware events (who are not) can lead to funny scheduling oddities. Giving them their own context solves this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Multiple task contextsPeter Zijlstra2010-09-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide the infrastructure for multiple task contexts. A more flexible approach would have resulted in more pointer chases in the scheduling hot-paths. This approach has the limitation of a static number of task contexts. Since I expect most external PMUs to be system wide, or at least node wide (as per the intel uncore unit) they won't actually need a task context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Per-pmu-per-cpu contextsPeter Zijlstra2010-09-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocate per-cpu contexts per pmu. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Per cpu-context rotation timerPeter Zijlstra2010-09-091-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Give each cpu-context its own timer so that it is a self contained entity, this eases the way for per-pmu-per-cpu contexts as well as provides the basic infrastructure to allow different rotation times per pmu. Things to look at: - folding the tick and these TICK_NSEC timers - separate task context rotation Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Remove the swevent hash-table from the cpu contextPeter Zijlstra2010-09-091-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate the swevent hash-table from the cpu_context bits in preparation for per pmu cpu contexts. This keeps the swevent hash a global entity. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Remove the sysfs bitsPeter Zijlstra2010-09-091-6/+0
| | | | | | | | | | | | | | | | | | | | Neither the overcommit nor the reservation sysfs parameter were actually working, remove them as they'll only get in the way. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Rework the PMU methodsPeter Zijlstra2010-09-091-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace pmu::{enable,disable,start,stop,unthrottle} with pmu::{add,del,start,stop}, all of which take a flags argument. The new interface extends the capability to stop a counter while keeping it scheduled on the PMU. We replace the throttled state with the generic stopped state. This also allows us to efficiently stop/start counters over certain code paths (like IRQ handlers). It also allows scheduling a counter without it starting, allowing for a generic frozen state (useful for rotating stopped counters). The stopped state is implemented in two different ways, depending on how the architecture implemented the throttled state: 1) We disable the counter: a) the pmu has per-counter enable bits, we flip that b) we program a NOP event, preserving the counter state 2) We store the counter state and ignore all read/overflow events Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Shrink hw_perf_eventPeter Zijlstra2010-09-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | Use hw_perf_event::period_left instead of hw_perf_event::remaining and win back 8 bytes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Default PMU opsPeter Zijlstra2010-09-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide default implementations for the pmu txn methods, this allows us to remove some conditional code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Per PMU disablePeter Zijlstra2010-09-091-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes perf_disable() into perf_pmu_disable(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Reduce perf_disable() usagePeter Zijlstra2010-09-091-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the current perf_disable() usage is only an optimization, remove it for now. This eases the removal of the __weak hw_perf_enable() interface. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Register PMU implementationsPeter Zijlstra2010-09-091-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simple registration interface for struct pmu, this provides the infrastructure for removing all the weak functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * perf: Deconstify struct pmuPeter Zijlstra2010-09-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"` Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'tip/perf/urgent' of ↵Ingo Molnar2010-08-191-1/+1
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core
| * | perf: Humanize the number of contextsFrederic Weisbecker2010-08-191-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of hardcoding the number of contexts for the recursions barriers, define a cpp constant to make the code more self-explanatory. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com>
| * | perf: Fix race in callchainsFrederic Weisbecker2010-08-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that software events don't have interrupt disabled anymore in the event path, callchains can nest on any context. So seperating nmi and others contexts in two buffers has become racy. Fix this by providing one buffer per nesting level. Given the size of the callchain entries (2040 bytes * 4), we now need to allocate them dynamically. v2: Fixed put_callchain_entry call after recursion. Fix the type of the recursion, it must be an array. v3: Use a manual pr cpu allocation (temporary solution until NMIs can safely access vmalloc'ed memory). Do a better separation between callchain reference tracking and allocation. Make the "put" path lockless for non-release cases. v4: Protect the callchain buffers with rcu. v5: Do the cpu buffers allocations node affine. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David Miller <davem@davemloft.net> Cc: Borislav Petkov <bp@amd64.org>
| * | perf: Generalize some arch callchain codeFrederic Weisbecker2010-08-191-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Most archs use one callchain buffer per cpu, except x86 that needs to deal with NMIs. Provide a default perf_callchain_buffer() implementation that x86 overrides. - Centralize all the kernel/user regs handling and invoke new arch handlers from there: perf_callchain_user() / perf_callchain_kernel() That avoid all the user_mode(), current->mm checks and so... - Invert some parameters in perf_callchain_*() helpers: entry to the left, regs to the right, following the traditional (dst, src). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
| * | perf: Generalize callchain_store()Frederic Weisbecker2010-08-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | callchain_store() is the same on every archs, inline it in perf_event.h and rename it to perf_callchain_store() to avoid any collision. This removes repetitive code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>
* | | perf: New helper function for pmu nameMatt Fleming2010-10-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce perf_pmu_name() helper function that returns the name of the pmu. This gives us a generic way to get the name of a pmu regardless of how an architecture identifies it internally. Signed-off-by: Matt Fleming <matt@console-pimps.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
* | | perf: Add helper function to return number of countersMatt Fleming2010-10-111-0/+1
| |/ |/| | | | | | | | | | | | | | | | | | | The number of counters for the registered pmu is needed in a few places so provide a helper function that returns this number. Signed-off-by: Matt Fleming <matt@console-pimps.org> Tested-by: Will Deacon <will.deacon@arm.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Robert Richter <robert.richter@amd.com>
* | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2010-08-061-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug sched: No need for bootmem special cases sched: Revert nohz_ratelimit() for now sched: Reduce update_group_power() calls sched: Update rq->clock for nohz balanced cpus sched: Fix spelling of sibling sched, cpuset: Drop __cpuexit from cpu hotplug callbacks sched: Fix the racy usage of thread_group_cputimer() in fastpath_timer_check() sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand() sched: thread_group_cputime: Simplify, document the "alive" check sched: Remove the obsolete exit_state/signal hacks sched: task_tick_rt: Remove the obsolete ->signal != NULL check sched: __sched_setscheduler: Read the RLIMIT_RTPRIO value lockless sched: Fix comments to make them DocBook happy sched: Fix fix_small_capacity powerpc: Exclude arch_sd_sibiling_asym_packing() on UP powerpc: Enable asymmetric SMT scheduling on POWER7 sched: Add asymmetric group packing option for sibling domain sched: Fix capacity calculations for SMT4 sched: Change nohz idle load balancing logic to push model ...
| * sched: define and use CPU_PRI_* enums for cpu notifier prioritiesTejun Heo2010-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Instead of hardcoding priority 10 and 20 in sched and perf, collect them into CPU_PRI_* enums. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf: Fix argument of perf_arch_fetch_caller_regsNobuhiro Iwamatsu2010-06-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "struct regs" was set to argument of perf_arch_fetch_caller_regs off-case. It should be "struct pt_regs". This fixes various build errors in archs that have CONFIG_PERF_EVENTS=y but no overriden implementation of perf_arch_fetch_caller_regs. cc1: warnings being treated as errors In file included from include/linux/ftrace_event.h:8, from include/trace/syscall.h:6, from include/linux/syscalls.h:75, from arch/sh/kernel/sys_sh32.c:9: include/linux/perf_event.h:937: error: 'struct regs' declared inside parameter list include/linux/perf_event.h:937: error: its scope is only this definition or declaration, which is probably not what you want include/linux/perf_event.h: In function 'perf_fetch_caller_regs': include/linux/perf_event.h:952: error: passing argument 1 of 'perf_arch_fetch_caller_regs' from incompatible pointer type Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Mackerras <paulus@samba.org> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <AANLkTinKKFKEBQrZ3Hkj-XCaMwaTqulb-XnFzqEYiFRr@mail.gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
OpenPOWER on IntegriCloud