summaryrefslogtreecommitdiffstats
path: root/share/doc/psd/18.gprof/present.me
diff options
context:
space:
mode:
Diffstat (limited to 'share/doc/psd/18.gprof/present.me')
-rw-r--r--share/doc/psd/18.gprof/present.me306
1 files changed, 306 insertions, 0 deletions
diff --git a/share/doc/psd/18.gprof/present.me b/share/doc/psd/18.gprof/present.me
new file mode 100644
index 0000000..1dd7f62
--- /dev/null
+++ b/share/doc/psd/18.gprof/present.me
@@ -0,0 +1,306 @@
+.\" Copyright (c) 1982, 1993
+.\" The Regents of the University of California. All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\" must display the following acknowledgement:
+.\" This product includes software developed by the University of
+.\" California, Berkeley and its contributors.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\" may be used to endorse or promote products derived from this software
+.\" without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" @(#)present.me 8.1 (Berkeley) 6/8/93
+.\"
+.sh 1 "Data Presentation"
+.pp
+The data is presented to the user in two different formats.
+The first presentation simply lists the routines
+without regard to the amount of time their descendants use.
+The second presentation incorporates the call graph of the
+program.
+.sh 2 "The Flat Profile
+.pp
+The flat profile consists of a list of all the routines
+that are called during execution of the program,
+with the count of the number of times they are called
+and the number of seconds of execution time for which they
+are themselves accountable.
+The routines are listed in decreasing order of execution time.
+A list of the routines that are never called during execution of
+the program is also available
+to verify that nothing important is omitted by
+this execution.
+The flat profile gives a quick overview of the routines that are used,
+and shows the routines that are themselves responsible
+for large fractions of the execution time.
+In practice,
+this profile usually shows that no single function
+is overwhelmingly responsible for
+the total time of the program.
+Notice that for this profile,
+the individual times sum to the total execution time.
+.sh 2 "The Call Graph Profile"
+.sz 10
+.(z
+.TS
+box center;
+c c c c c l l
+c c c c c l l
+c c c c c l l
+l n n n c l l.
+ called/total \ \ parents
+index %time self descendants called+self name index
+ called/total \ \ children
+_
+ 0.20 1.20 4/10 \ \ \s-1CALLER1\s+1 [7]
+ 0.30 1.80 6/10 \ \ \s-1CALLER2\s+1 [1]
+[2] 41.5 0.50 3.00 10+4 \s-1EXAMPLE\s+1 [2]
+ 1.50 1.00 20/40 \ \ \s-1SUB1\s+1 <cycle1> [4]
+ 0.00 0.50 1/5 \ \ \s-1SUB2\s+1 [9]
+ 0.00 0.00 0/5 \ \ \s-1SUB3\s+1 [11]
+.TE
+.ce 2
+Profile entry for \s-1EXAMPLE\s+1.
+Figure 4.
+.)z
+.pp
+Ideally, we would like to print the call graph of the program,
+but we are limited by the two-dimensional nature of our output
+devices.
+We cannot assume that a call graph is planar,
+and even if it is, that we can print a planar version of it.
+Instead, we choose to list each routine,
+together with information about
+the routines that are its direct parents and children.
+This listing presents a window into the call graph.
+Based on our experience,
+both parent information and child information
+is important,
+and should be available without searching
+through the output.
+.pp
+The major entries of the call graph profile are the entries from the
+flat profile, augmented by the time propagated to each
+routine from its descendants.
+This profile is sorted by the sum of the time for the routine
+itself plus the time inherited from its descendants.
+The profile shows which of the higher level routines
+spend large portions of the total execution time
+in the routines that they call.
+For each routine, we show the amount of time passed by each child
+to the routine, which includes time for the child itself
+and for the descendants of the child
+(and thus the descendants of the routine).
+We also show the percentage these times represent of the total time
+accounted to the child.
+Similarly, the parents of each routine are listed,
+along with time,
+and percentage of total routine time,
+propagated to each one.
+.pp
+Cycles are handled as single entities.
+The cycle as a whole is shown as though it were a single routine,
+except that members of the cycle are listed in place of the children.
+Although the number of calls of each member
+from within the cycle are shown,
+they do not affect time propagation.
+When a child is a member of a cycle,
+the time shown is the appropriate fraction of the time
+for the whole cycle.
+Self-recursive routines have their calls broken
+down into calls from the outside and self-recursive calls.
+Only the outside calls affect the propagation of time.
+.pp
+The following example is a typical fragment of a call graph.
+.(b
+.so pres1.pic
+.)b
+The entry in the call graph profile listing for this example is
+shown in Figure 4.
+.pp
+The entry is for routine \s-1EXAMPLE\s+1, which has
+the Caller routines as its parents,
+and the Sub routines as its children.
+The reader should keep in mind that all information
+is given \fIwith respect to \s-1EXAMPLE\s+1\fP.
+The index in the first column shows that \s-1EXAMPLE\s+1
+is the second entry in the profile listing.
+The \s-1EXAMPLE\s+1 routine is called ten times, four times by \s-1CALLER1\s+1,
+and six times by \s-1CALLER2\s+1.
+Consequently 40% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER1\s+1,
+and 60% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER2\s+1.
+The self and descendant fields of the parents
+show the amount of self and descendant time \s-1EXAMPLE\s+1
+propagates to them (but not the time used by
+the parents directly).
+Note that \s-1EXAMPLE\s+1 calls itself recursively four times.
+The routine \s-1EXAMPLE\s+1 calls routine \s-1SUB1\s+1 twenty times, \s-1SUB2\s+1 once,
+and never calls \s-1SUB3\s+1.
+Since \s-1SUB2\s+1 is called a total of five times,
+20% of its self and descendant time is propagated to \s-1EXAMPLE\s+1's
+descendant time field.
+Because \s-1SUB1\s+1 is a member of \fIcycle 1\fR,
+the self and descendant times
+and call count fraction
+are those for the cycle as a whole.
+Since cycle 1 is called a total of forty times
+(not counting calls among members of the cycle),
+it propagates 50% of the cycle's self and descendant
+time to \s-1EXAMPLE\s+1's descendant time field.
+Finally each name is followed by an index that shows
+where on the listing to find the entry for that routine.
+.sh 1 "Using the Profiles"
+.pp
+The profiler is a useful tool for improving
+a set of routines that implement an abstraction.
+It can be helpful in identifying poorly coded routines,
+and in evaluating the new algorithms and code that replace them.
+Taking full advantage of the profiler
+requires a careful examination of the call graph profile,
+and a thorough knowledge of the abstractions underlying
+the program.
+.pp
+The easiest optimization that can be performed
+is a small change
+to a control construct or data structure that improves the
+running time of the program.
+An obvious starting point
+is a routine that is called many times.
+For example, suppose an output
+routine is the only parent
+of a routine that formats the data.
+If this format routine is expanded inline in the
+output routine, the overhead of a function call and
+return can be saved for each datum that needs to be formatted.
+.pp
+The drawback to inline expansion is that the data abstractions
+in the program may become less parameterized,
+hence less clearly defined.
+The profiling will also become less useful since the loss of
+routines will make its output more granular.
+For example,
+if the symbol table functions ``lookup'', ``insert'', and ``delete''
+are all merged into a single parameterized routine,
+it will be impossible to determine the costs
+of any one of these individual functions from the profile.
+.pp
+Further potential for optimization lies in routines that
+implement data abstractions whose total execution
+time is long.
+For example, a lookup routine might be called only a few
+times, but use an inefficient linear search algorithm,
+that might be replaced with a binary search.
+Alternately, the discovery that a rehashing function is being
+called excessively, can lead to a different
+hash function or a larger hash table.
+If the data abstraction function cannot easily be speeded up,
+it may be advantageous to cache its results,
+and eliminate the need to rerun
+it for identical inputs.
+These and other ideas for program improvement are discussed in
+[Bentley81].
+.pp
+This tool is best used in an iterative approach:
+profiling the program,
+eliminating one bottleneck,
+then finding some other part of the program
+that begins to dominate execution time.
+For instance, we have used \fBgprof\fR on itself;
+eliminating, rewriting, and inline expanding routines,
+until reading
+data files (hardly a target for optimization!)
+represents the dominating factor in its execution time.
+.pp
+Certain types of programs are not easily analyzed by \fBgprof\fR.
+They are typified by programs that exhibit a large degree of
+recursion, such as recursive descent compilers.
+The problem is that most of the major routines are grouped
+into a single monolithic cycle.
+As in the symbol table abstraction that is placed
+in one routine,
+it is impossible to distinguish which members of the cycle are
+responsible for the execution time.
+Unfortunately there are no easy modifications to these programs that
+make them amenable to analysis.
+.pp
+A completely different use of the profiler is to analyze the control
+flow of an unfamiliar program.
+If you receive a program from another user that you need to modify
+in some small way,
+it is often unclear where the changes need to be made.
+By running the program on an example and then using \fBgprof\fR,
+you can get a view of the structure of the program.
+.pp
+Consider an example in which you need to change the output format
+of the program.
+For purposes of this example suppose that the call graph
+of the output portion of the program has the following structure:
+.(b
+.so pres2.pic
+.)b
+Initially you look through the \fBgprof\fR
+output for the system call ``\s-1WRITE\s+1''.
+The format routine you will need to change is probably
+among the parents of the ``\s-1WRITE\s+1'' procedure.
+The next step is to look at the profile entry for each
+of parents of ``\s-1WRITE\s+1'',
+in this example either ``\s-1FORMAT1\s+1'' or ``\s-1FORMAT2\s+1'',
+to determine which one to change.
+Each format routine will have one or more parents,
+in this example ``\s-1CALC1\s+1'', ``\s-1CALC2\s+1'', and ``\s-1CALC3\s+1''.
+By inspecting the source code for each of these routines
+you can determine which format routine generates the output that
+you wish to modify.
+Since the \fBgprof\fR entry shows all the
+potential calls to the format routine you intend to change,
+you can determine if your modifications will affect output that
+should be left alone.
+If you desire to change the output of ``\s-1CALC2\s+1'', but not ``\s-1CALC3\s+1'',
+then formatting routine ``\s-1FORMAT2\s+1'' needs to be split
+into two separate routines,
+one of which implements the new format.
+You can then retarget just the call by ``\s-1CALC2\s+1''
+that needs the new format.
+It should be noted that the static call information is particularly
+useful here since the test case you run probably will not
+exercise the entire program.
+.sh 1 "Conclusions"
+.pp
+We have created a profiler that aids in the evaluation
+of modular programs.
+For each routine in the program,
+the profile shows the extent to which that routine
+helps support various abstractions,
+and how that routine uses other abstractions.
+The profile accurately assesses the cost of routines
+at all levels of the program decomposition.
+The profiler is easily used,
+and can be compiled into the program without any prior planning by
+the programmer.
+It adds only five to thirty percent execution overhead to the program
+being profiled,
+produces no additional output until after the program finishes,
+and allows the program to be measured in its actual environment.
+Finally, the profiler runs on a time-sharing system
+using only the normal services provided by the operating system
+and compilers.
OpenPOWER on IntegriCloud