From 08825c90af6e4bb902b3a51abb0ae6530199f682 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Fri, 17 May 2013 10:31:20 +0800 Subject: watchdog: Document watchdog_thresh sysctl Signed-off-by: Li Zefan Acked-by: Don Zickus Link: http://lkml.kernel.org/r/51959678.6000802@huawei.com Signed-off-by: Ingo Molnar --- Documentation/sysctl/kernel.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index ccd4258..e8fabd6 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -76,6 +76,7 @@ show up in /proc/sys/kernel: - tainted - threads-max - unknown_nmi_panic +- watchdog_thresh - version ============================================================== @@ -648,3 +649,16 @@ that time, kernel debugging information is displayed on console. NMI switch that most IA32 servers have fires unknown NMI up, for example. If a system hangs up, try pressing the NMI switch. + +============================================================== + +watchdog_thresh: + +This value can be used to control the frequency of hrtimer and NMI +events and the soft and hard lockup thresholds. The default threshold +is 10 seconds. + +The softlockup threshold is (2 * watchdog_thresh). Setting this +tunable to zero will disable lockup detection altogether. + +============================================================== -- cgit v1.1 From c0ffaf3655fab1909a920c8f30ba1722932d01bb Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Fri, 17 May 2013 10:31:35 +0800 Subject: watchdog: Remove softlockup_thresh from Documentation The old softlockup detector has been replaced with new lockup detector long ago. Signed-off-by: Li Zefan Acked-by: Don Zickus Link: http://lkml.kernel.org/r/51959687.9090305@huawei.com Signed-off-by: Ingo Molnar --- Documentation/sysctl/kernel.txt | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index e8fabd6..bcff3f9 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -70,7 +70,6 @@ show up in /proc/sys/kernel: - shmall - shmmax [ sysv ipc ] - shmmni -- softlockup_thresh - stop-a [ SPARC only ] - sysrq ==> Documentation/sysrq.txt - tainted @@ -605,15 +604,6 @@ without users and with a dead originative process will be destroyed. ============================================================== -softlockup_thresh: - -This value can be used to lower the softlockup tolerance threshold. The -default threshold is 60 seconds. If a cpu is locked up for 60 seconds, -the kernel complains. Valid values are 1-60 seconds. Setting this -tunable to zero will disable the softlockup detection altogether. - -============================================================== - tainted: Non-zero if the kernel has been tainted. Numeric values, which -- cgit v1.1 From 54aa3b99982a6e5f12b52b394244b5086a330a34 Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu Date: Sat, 6 Apr 2013 09:52:05 -0700 Subject: perf: Power7 Update testing ABI to list CPI-stack events Following patch added several Power7 events into /sys/devices/cpu/events. Document those events in the testing ABI. https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-April/105167.html Signed-off-by: Sukadev Bhattiprolu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/20130406170623.GA900@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo --- .../testing/sysfs-bus-event_source-devices-events | 32 ++++++++++++++++++---- 1 file changed, 27 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events index 0adeb52..8b25ffb 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events @@ -27,14 +27,36 @@ Description: Generic performance monitoring events "basename". -What: /sys/devices/cpu/events/PM_LD_MISS_L1 - /sys/devices/cpu/events/PM_LD_REF_L1 - /sys/devices/cpu/events/PM_CYC +What: /sys/devices/cpu/events/PM_1PLUS_PPC_CMPL /sys/devices/cpu/events/PM_BRU_FIN - /sys/devices/cpu/events/PM_GCT_NOSLOT_CYC /sys/devices/cpu/events/PM_BRU_MPRED - /sys/devices/cpu/events/PM_INST_CMPL /sys/devices/cpu/events/PM_CMPLU_STALL + /sys/devices/cpu/events/PM_CMPLU_STALL_BRU + /sys/devices/cpu/events/PM_CMPLU_STALL_DCACHE_MISS + /sys/devices/cpu/events/PM_CMPLU_STALL_DFU + /sys/devices/cpu/events/PM_CMPLU_STALL_DIV + /sys/devices/cpu/events/PM_CMPLU_STALL_ERAT_MISS + /sys/devices/cpu/events/PM_CMPLU_STALL_FXU + /sys/devices/cpu/events/PM_CMPLU_STALL_IFU + /sys/devices/cpu/events/PM_CMPLU_STALL_LSU + /sys/devices/cpu/events/PM_CMPLU_STALL_REJECT + /sys/devices/cpu/events/PM_CMPLU_STALL_SCALAR + /sys/devices/cpu/events/PM_CMPLU_STALL_SCALAR_LONG + /sys/devices/cpu/events/PM_CMPLU_STALL_STORE + /sys/devices/cpu/events/PM_CMPLU_STALL_THRD + /sys/devices/cpu/events/PM_CMPLU_STALL_VECTOR + /sys/devices/cpu/events/PM_CMPLU_STALL_VECTOR_LONG + /sys/devices/cpu/events/PM_CYC + /sys/devices/cpu/events/PM_GCT_NOSLOT_BR_MPRED + /sys/devices/cpu/events/PM_GCT_NOSLOT_BR_MPRED_IC_MISS + /sys/devices/cpu/events/PM_GCT_NOSLOT_CYC + /sys/devices/cpu/events/PM_GCT_NOSLOT_IC_MISS + /sys/devices/cpu/events/PM_GRP_CMPL + /sys/devices/cpu/events/PM_INST_CMPL + /sys/devices/cpu/events/PM_LD_MISS_L1 + /sys/devices/cpu/events/PM_LD_REF_L1 + /sys/devices/cpu/events/PM_RUN_CYC + /sys/devices/cpu/events/PM_RUN_INST_CMPL Date: 2013/01/08 -- cgit v1.1 From fd851780e61ac36e8d59fe87cca01a2e673930ff Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 10 May 2013 17:33:00 +0200 Subject: perf: Expand definition of sysfs format attribute Make it explicit that the format attributes may define overlapping bit ranges. Unfortunately this was left unspecified originally, and all the examples show non-overlapping ranges. I don't believe this is an ABI change, as we are defining something that was previously undefined, but others may disagree. The POWER8 PMU would like to define overlapping ranges, as bit ranges in the event code have different meanings for certain events. It will also allow us to define an overarching "event" field, that encompasses all others. As far as I can see perf is comfortable with this change, however I am not sure if there are any other users of the interface. Signed-off-by: Michael Ellerman Acked-by: Peter Zijlstra Cc: Corey Ashford Cc: Frederic Weisbecker Cc: Jiri Olsa Cc: Namhyung Kim Cc: Paul Mackerras Cc: Peter Zijlstra Cc: Stephane Eranian Link: http://lkml.kernel.org/r/1368199980-20283-1-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa Signed-off-by: Arnaldo Carvalho de Melo --- Documentation/ABI/testing/sysfs-bus-event_source-devices-format | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format index 079afc7..77f47ff 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format @@ -9,6 +9,12 @@ Description: we want to export, so that userspace can deal with sane name/value pairs. + Userspace must be prepared for the possibility that attributes + define overlapping bit ranges. For example: + attr1 = 'config:0-23' + attr2 = 'config:0-7' + attr3 = 'config:12-35' + Example: 'config1:1,6-10,44' Defines contents of attribute that occupies bits 1,6-10,44 of perf_event_attr::config1. -- cgit v1.1 From 14c63f17b1fde5a575a28e96547a22b451c71fb5 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Fri, 21 Jun 2013 08:51:36 -0700 Subject: perf: Drop sample rate when sampling is too slow This patch keeps track of how long perf's NMI handler is taking, and also calculates how many samples perf can take a second. If the sample length times the expected max number of samples exceeds a configurable threshold, it drops the sample rate. This way, we don't have a runaway sampling process eating up the CPU. This patch can tend to drop the sample rate down to level where perf doesn't work very well. *BUT* the alternative is that my system hangs because it spends all of its time handling NMIs. I'll take a busted performance tool over an entire system that's busted and undebuggable any day. BTW, my suspicion is that there's still an underlying bug here. Using the HPET instead of the TSC is definitely a contributing factor, but I suspect there are some other things going on. But, I can't go dig down on a bug like that with my machine hanging all the time. Signed-off-by: Dave Hansen Acked-by: Peter Zijlstra Cc: paulus@samba.org Cc: acme@ghostprotocols.net Cc: Dave Hansen [ Prettified it a bit. ] Signed-off-by: Ingo Molnar --- Documentation/sysctl/kernel.txt | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'Documentation') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index bcff3f9..ab7d16e 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -427,6 +427,32 @@ This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled. ============================================================== +perf_cpu_time_max_percent: + +Hints to the kernel how much CPU time it should be allowed to +use to handle perf sampling events. If the perf subsystem +is informed that its samples are exceeding this limit, it +will drop its sampling frequency to attempt to reduce its CPU +usage. + +Some perf sampling happens in NMIs. If these samples +unexpectedly take too long to execute, the NMIs can become +stacked up next to each other so much that nothing else is +allowed to execute. + +0: disable the mechanism. Do not monitor or correct perf's + sampling rate no matter how CPU time it takes. + +1-100: attempt to throttle perf's sample rate to this + percentage of CPU. Note: the kernel calculates an + "expected" length of each sample event. 100 here means + 100% of that expected length. Even if this is set to + 100, you may still see sample throttling if this + length is exceeded. Set to 0 if you truly do not care + how much CPU is consumed. + +============================================================== + pid_max: -- cgit v1.1 From 0c4df02d739fed5ab081b330d67403206dd3967e Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Fri, 21 Jun 2013 08:51:38 -0700 Subject: x86: Add NMI duration tracepoints This patch has been invaluable in my adventures finding issues in the perf NMI handler. I'm as big a fan of printk() as anybody is, but using printk() in NMIs is deadly when they're happening frequently. Even hacking in trace_printk() ended up eating enough CPU to throw off some of the measurements I was making. Signed-off-by: Dave Hansen Acked-by: Peter Zijlstra Cc: paulus@samba.org Cc: acme@ghostprotocols.net Cc: Dave Hansen Signed-off-by: Ingo Molnar --- Documentation/trace/events-nmi.txt | 43 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 Documentation/trace/events-nmi.txt (limited to 'Documentation') diff --git a/Documentation/trace/events-nmi.txt b/Documentation/trace/events-nmi.txt new file mode 100644 index 0000000..c03c8c8 --- /dev/null +++ b/Documentation/trace/events-nmi.txt @@ -0,0 +1,43 @@ +NMI Trace Events + +These events normally show up here: + + /sys/kernel/debug/tracing/events/nmi + +-- + +nmi_handler: + +You might want to use this tracepoint if you suspect that your +NMI handlers are hogging large amounts of CPU time. The kernel +will warn if it sees long-running handlers: + + INFO: NMI handler took too long to run: 9.207 msecs + +and this tracepoint will allow you to drill down and get some +more details. + +Let's say you suspect that perf_event_nmi_handler() is causing +you some problems and you only want to trace that handler +specifically. You need to find its address: + + $ grep perf_event_nmi_handler /proc/kallsyms + ffffffff81625600 t perf_event_nmi_handler + +Let's also say you are only interested in when that function is +really hogging a lot of CPU time, like a millisecond at a time. +Note that the kernel's output is in milliseconds, but the input +to the filter is in nanoseconds! You can filter on 'delta_ns': + +cd /sys/kernel/debug/tracing/events/nmi/nmi_handler +echo 'handler==0xffffffff81625600 && delta_ns>1000000' > filter +echo 1 > enable + +Your output would then look like: + +$ cat /sys/kernel/debug/tracing/trace_pipe +-0 [000] d.h3 505.397558: nmi_handler: perf_event_nmi_handler() delta_ns: 3236765 handled: 1 +-0 [000] d.h3 505.805893: nmi_handler: perf_event_nmi_handler() delta_ns: 3174234 handled: 1 +-0 [000] d.h3 506.158206: nmi_handler: perf_event_nmi_handler() delta_ns: 3084642 handled: 1 +-0 [000] d.h3 506.334346: nmi_handler: perf_event_nmi_handler() delta_ns: 3080351 handled: 1 + -- cgit v1.1