| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Minor spelling fixes in:
tools, share, bluetooth, pmcstat, etc
Many of these have user-visible strings.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix hwpmc "stalled" behavior
Currently, there is a single pm_stalled flag that tracks whether a
performance monitor was "stalled" due to insufficent ring buffer
space for samples. However, because the same performance monitor
can run on multiple processes or threads at the same time, a single
pm_stalled flag that impacts them all seems insufficient.
In particular, you can hit corner cases where the code fails to stop
performance monitors during a context switch out, because it thinks
the performance monitor is already stopped. However, in reality,
it may be that only the monitor running on a different CPU was stalled.
This patch attempts to fix that behavior by tracking on a per-CPU basis
whether a PM desires to run and whether it is "stalled". This lets the
code make better decisions about when to stop PMs and when to try to
restart them. Ideally, we should avoid the case where the code fails
to stop a PM during a context switch out.
MFC r290813:
Optimizations to the way hwpmc gathers user callchains
Changes to the code to gather user stacks:
* Delay setting pmc_cpumask until we actually have the stack.
* When recording user stack traces, only walk the portion of the ring
that should have samples for us.
MFC r290929:
Change the driver stats to what they really are: unsigned values.
When pmcstat exits after some samples were dropped, give the user an
idea of how many were lost. (Granted, these are global numbers, but
they may still help quantify the scope of the loss.)
MFC r290930:
Improve accuracy of PMC sampling frequency
The code tracks a counter which is the number of events until the next
sample. On context switch in, it loads the saved counter. On context
switch out, it tries to calculate a new saved counter.
Problems:
1. The saved counter was shared by all threads in a process. However, this
means that all threads would be initially loaded with the same saved
counter. However, that could result in sampling more often than once every
X number of events.
2. The calculation to determine a new saved counter was backwards. It
added when it should have subtracted, and subtracted when it should have
added. Assume a single-threaded process with a reload count of 1000
events. Assuming the counter on context switch in was 100 and the counter
on context switch out was 50 (meaning the thread has "consumed" 50 more
events), the code would calculate a new saved counter of 150 (instead of
the proper 50).
Fix:
1. As soon as the saved counter is used to initialize a monitor for a
thread on context switch in, set the saved counter to the reload count.
That way, subsequent threads to use the saved counter will get the full
reload count, assuring we sample at least once every X number of events
(across all threads).
2. Change the calculation of the saved counter. Due to the change to the
saved counter in #1, we simply need to add (modulo the reload count) the
remaining counter time we retrieve from the CPU when a thread is context
switched out.
MFC r291016:
Support a wider history counter in pmcstat(8) gmon output
pmcstat(8) contains an option to output sampling data in a gmon format
compatible with gprof(1). Currently, it uses the default histcounter,
which is an (unsigned short). With large sets of sampling data, it
is possible to overflow the maximum value provided by an (unsigned
short).
This change adds the -e argument to pmcstat. If -e and -g are both
specified, pmcstat will use a histcounter type of uint64_t.
MFC r291017:
Fix the date on the pmcstat(8) man page from r291016.
|
|
|
|
|
|
|
|
|
|
| |
pmcstat.8: fix -a flag description; improve -m flag to match
The -a flag reads a file saved by -O, not -o.
The -m flag requires the -R flag. Copy that paragraph from -a.
Sponsored by: Dell Inc.
|
|
|
|
|
|
|
|
|
| |
Use the cpuset API more consistently:
- Fetch the root set from cpuset_getaffinity() instead of assuming all CPUs
from 0 to hw.ncpu are the root set.
- Use CPU_SETSIZE and CPU_FFS.
- The original notion of halted CPUs the manpage and code refers to is gone.
Use the term "available" instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix pmcstat symbol resolution for userland processes.
When examining existing processes pmcstat fails to
correctly determine the locations of executable sections
of the process due to a miscalculated virtual load address.
This does not affect the newly launched processes as the
same value passed as a "start address" to the pmcstat_image_link()
thus nullifying the effect of it. The issue manifests itself
in processes not being reported in the pmcstat(8) output and
"dubious frames" being reported.
Fix it for now by ignoring all the sections except the executable
one. This won't fix the issue for objects with multiple
executable sections but helps in majority of real world usecases.
The real solution would be to modify the MAP-IN event to include
the appropriate load address so pmcstat(8) won't have to manually
parse object files to try to determine it.
PR: 198147, 198148
Submitted by: stas
|
|
|
|
|
|
|
|
| |
Use the kern.bootfile sysctl to set the default kernel path rather than
hardcoding /boot/kernel. This allows pmcstat(8) to work without -k when
using nextboot -k or 'boot foo' at the loader to boot alternate kernels.
Sponsored by: Norse Corp, Inc.
|
|
|
|
|
|
|
|
| |
Clarify the documentation of pmcstat:
the -d argument should be passed before -p, -s, -P or -S to be taken in account
Differential Revision: https://reviews.freebsd.org/D1011
Reviewed by: adrian, gnn
|
| |
|
|
|
|
|
|
|
|
| |
Add a command line argument (-l) to end event collection after some
number of seconds. The number of seconds may be a fraction.
Submitted by: Julien Charbon <jcharbon@versign.com>
Relnotes: yes
|
|
|
|
|
|
|
|
|
| |
In one case generating callgraph output from a 24MB system-wide sampling
data file took 17.4 seconds on average. Profiling showed pmcstat
spending a lot of time in strcmp, due to hash collisions.
Replacing the XOR-only hash with FNV-1a reduces the run time for my
test by 40%.
|
|
|
|
|
|
|
| |
Add the -a option to pmcstat. This produces a full stack track on the
sampled points. See the man page for details on how this works.
Obtained from: Netflix, Inc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In addition to adding `static' where possible:
- bin/date: Move `retval' into extern.h to make it visible to date.c.
- bin/ed: Move globally used variables into ed.h.
- sbin/camcontrol: Move `verbose' into camcontrol.h and fix shadow warnings.
- usr.bin/calendar: Remove unneeded variables.
- usr.bin/chat: Make `line' local instead of global.
- usr.bin/elfdump: Comment out unneeded function.
- usr.bin/rlogin: Use _Noreturn instead of __dead2.
- usr.bin/tset: Pull `Ospeed' into extern.h.
- usr.sbin/mfiutil: Put global variables in mfiutil.h.
- usr.sbin/pkg: Remove unused `os_corres'.
- usr.sbin/quotaon, usr.sbin/repquota: Remove unused `qfname'.
|
|
|
|
|
|
|
| |
message.
Sponsored by: Intel
MFC after: 3 days
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
|
|
| |
PR: bin/167361
Submitted by: Slawa Olhovchenkov <slw zxy.spb.ru>
Silence from: jkoshy
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New kernel events can be added at various location for sampling or counting.
This will for example allow easy system profiling whatever the processor is
with known tools like pmcstat(8).
Simultaneous usage of software PMC and hardware PMC is possible, for example
looking at the lock acquire failure, page fault while sampling on
instructions.
Sponsored by: NETASQ
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case of multiple level of inlining all the locations are flattened.
Require recent binutils/addr2line (head works or binutils from ports
with the right $PATH order).
- Multiple fixes in the calltree output (recursion case, ...)
- Fix the calltree top view that previously hide some shared nodes.
Tested with Kcachegrind(kdesdk4)/qcachegrind(head).
Sponsored by: NETASQ
|
|
|
|
| |
Reviewed by: brueffer
|
|
|
|
|
| |
mandatory for ELF binaries so we'll use the segment with offset less then
alignment and align it appropriately (which covers pt_offset == 0 case)
|
|
|
|
|
|
|
| |
error: variable 'current_cpu' set but not used
Approved by: dim, cperciva (mentor, blanket for pre-mentorship already-approved commits)
MFC after: 3 days
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Submitted by: eadler
Approved by: simon
MFC after: 3 days
|
|
|
|
|
|
|
| |
- Do not close stdout or stderr when redirecting to file.
- Correctly handle error code to detect when no buffer available.
MFC after: 1 month
|
|
|
|
|
|
|
|
| |
As the underlying block is 4KB if the PMC throughput is low the measurement
will be reported on the next tick. pmcstat(8) use the modified flush API to
reclaim current buffer before displaying next top.
MFC after: 1 month
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ints. That fixes a first bug where pmcstat wasn't using the old
cpumask_t interface and now also brings the full support for more
than 32 cpus.
While here, make the functions pmcstat_clone_event_descriptor() and
pmcstat_get_cpumask() private to pmcstat.
The problem of assuming cpu dense masks still persists and should be
eventually fixed, as reported by avg.
Tested by: pluknet
Reviewed by: gnn
Approved by: re (kib)
|
|
|
|
|
|
| |
Reported by: Pan Tsu <inyaoo@gmail.com>
Reviewed by: attilio
No objections: gnn
|
| |
|
|
|
|
|
|
|
|
| |
will be spread as small value and then filtered by the threshold.
As a first step solution display the number of event that cannot
be resolved as a valid function location.
MFC after: 1week
|
|
|
|
|
|
| |
- Revert the fix on rtld path that is not necessary.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This will allow top monitoring using socket/ssh tunnelling
of system without local symbols.
client:
pmcstat -R <ip>:<port> -T -r <symbolspath>
monitored device:
pmcstat -Sinstructions -O <ip>:<port>
- Move the file read in the event loop
- Initialize and clean log in all cases
- Preserve global stats value during top refresh
- Fix the rtld/line resolver that ignore '-r' prefix
- Support socket for '-R' (server mode)
- Display the statistics when exiting top mode
|
|
|
|
| |
MFC after: 1 week
|
|
|
|
|
| |
Found with: Coverity Prevent(tm)
MFC after: 1 month
|
|
|
|
|
|
| |
The percentage show is the sum of the cost for the codepath.
MFC after: 1 week
|
|
|
|
| |
MFC after: 3 days
|
|
|
|
| |
MFC after: 3days
|
|
|
|
|
|
| |
Fix exit from top mode when checking if PMC is available.
MFC after: 3 days
|
|
|
|
|
|
|
|
| |
Although groff_mdoc(7) gives another impression, this is the ordering
most widely used and also required by mdocml/mandoc.
Reviewed by: ru
Approved by: philip, ed (mentors)
|
|
|
|
|
| |
- Display sample received per PMCs (or merged PMCs).
- Display percentage vs all samples
|
|
|
|
|
|
| |
This will solve an abort in case of low throughput PMCs.
MFC after: 3days
|
|
|
|
| |
MFC after: 3days
|
|
|
|
|
|
|
|
|
|
|
|
| |
pmc_flush_logfile is now non-blocking and just ask the kernel
to shutdown the file. From that point, no more data is
accepted by the log thread and when the last buffer is flushed
the file is closed.
This will remove a deadlock between pmcstat asking for
flush while it cannot flush the pipe itself.
MFC after: 3 days
|
|
|
|
|
|
|
|
| |
- no display on serial terminal in top mode.
- display alignment for continuation string.
- correct invalid value used for display limit.
MFC after: 3 days
|
| |
|
|
|
|
|
|
|
|
| |
- Kcachegrind (calltree) support with assembly/source
code mapping and call count estimator (-F).
- Top mode for calltree and callgraph plugin (-T).
MFC after: 1 month
|
| |
|
|
|
|
|
|
|
|
|
| |
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.
PR: 137213
Submitted by: Eygene Ryabinkin (initial version)
MFC after: 1 month
|
| |
|