Understanding Variations in Dhrystone Performance


          By Reinhold P. Weicker, Siemens AG, AUT E 51, Erlangen


                                April 1989


                      This article has appeared in:


        Microprocessor Report, May 1989 (Editor: M. Slater), pp. 16-17


Microprocessor manufacturers tend to credit all the  performance  measured  by
benchmarks to the speed of their processors, they often don't even mention the
programming language and compiler used. In their detailed  documents,  usually
called  "performance brief" or "performance report," they usually do give more
details. However, these details are often lost in the press releases and other
marketing  statements.  For serious performance evaluation, it is necessary to
study the code generated by the various compilers.

Dhrystone was originally published in Ada (Communications  of  the  ACM,  Oct.
1984).  However, since good Ada compilers were rare at this time and, together
with UNIX, C became more and more popular, the C version of Dhrystone  is  the
one  now  mainly  used in industry. There are "official" versions 2.1 for Ada,
Pascal, and C,  which  are  as  close  together  as  the  languages'  semantic
differences permit.

Dhrystone contains two statements  where  the  programming  language  and  its
translation play a major part in the execution time measured by the benchmark:

  o   String assignment (in procedure Proc_0 / main)
  o   String comparison (in function Func_2)

In Ada and Pascal, strings are arrays of characters where the  length  of  the
string  is  part  of the type information known at compile time. In C, strings
are also arrays of characters, but there  are  no  operators  defined  in  the
language  for  assignment  and  comparison  of  strings.   Instead,  functions
"strcpy" and "strcmp" are used. These functions are  defined  for  strings  of
arbitrary  length, and make use of the fact that strings in C have to end with
a terminating null byte. For general-purpose calls  to  these  functions,  the
implementor  can  assume  nothing  about  the  length and the alignment of the
strings involved.

The C version of Dhrystone spends a relatively large amount of time  in  these
two  functions.  Some  time  ago, I made measurements on a VAX 11/785 with the
Berkeley UNIX (4.2) compilers (often-used compilers,  but  certainly  not  the
most  advanced).  In  the  C  version, 23% of the time was spent in the string
functions; in the Pascal version, only 10%. On good RISC machines (where  less
time is spent in the procedure calling sequence than on a VAX) and with better
optimizing compilers, the percentage is higher; MIPS has reported 34%  for  an
R3000.   Because  of this effect, Pascal and Ada Dhrystone results are usually
better than C results (except when the optimization quality of the C  compiler
is considerably better than that of the other compilers).

Several people have noted that the string operations are  over-represented  in
Dhrystone,  mainly  because the strings occurring in Dhrystone are longer than
average strings. I admit that this is true, and have said  so  in  my  SIGPLAN
Notices  paper  (Aug.  1988);  however, I didn't want to generate confusion by
changing the string lengths from version 1 to version 2.

Even if they are somewhat over-represented in Dhrystone, string operations are
frequent  enough  that  it makes sense to implement them in the most efficient
way possible, not only for benchmarking purposes.  This means  that  they  can
and should be written in assembly language code. ANSI C also explicitly allows
the strings functions to be implemented as macros, i.e. by inline code.

There is also a third way to speed up the "strcpy" statement in Dhrystone: For
this  particular  "strcpy" statement, the source of the assignment is a string
constant. Therefore, in contrast to calls to "strcpy" in the general case, the
compiler  knows  the  length  and alignment of the strings involved at compile
time and can generate code in the same efficient  way  as  a  Pascal  compiler
(word instructions instead of byte instructions).

This is not allowed in the case of the "strcmp" call: Here, the addresses  are
formal  procedure  parameters, and no assumptions can be made about the length
or alignment of the strings.  Any such assumptions would indicate an incorrect
implementation.  They  might work for Dhrystone, where the strings are in fact
word-aligned  with  typical  compilers,  but  other  programs  would   deliver
incorrect results.

So, for an apple-to-apple  comparison  between  processors,  and  not  between
several  possible  (legal  or  illegal)  degrees of compiler optimization, one
should check that the systems are comparable with  respect  to  the  following
three points:

  (1) String functions in assembly language vs. in C

      Frequently used functions such as the string functions can and should be
      written  in  assembly language, and all serious C language systems known
      to me do this. (I list this point  for  completeness  only.)  Note  that
      processors  with an instruction that checks a word for a null byte (such
      as AMD's  29000  and  Intel's  80960)  have  an  advantage  here.  (This
      advantage  decreases  relatively if optimization (3) is applied.) Due to
      the length of the strings involved in Dhrystone, this advantage  may  be
      considered  too  high  in  perspective, but it is certainly legal to use
      such instructions - after all,  these  situations  are  what  they  were
      invented for.

  (2) String function code inline vs. as library functions.

      ANSI  C  has  created  a  new  situation,  compared   with   the   older
      Kernighan/Ritchie  C.  In  the  original C, the definition of the string
      function was not part of the  language.  Now  it  is,  and  inlining  is
      explicitly  allowed.  I  probably  should have stated more clearly in my
      SIGPLAN  Notices  paper  that  the  rule  "No  procedure  inlining   for
      Dhrystone"  referred  to  the  user level procedures only and not to the
      library routines.

  (3) Fixed-length and alignment assumptions for the strings

      Compilers should be allowed to optimize in these cases if (and only  if)
      it  is safe to do so. For Dhrystone, this is the "strcpy" statement, but
      not the  "strcmp"  statement  (unless,  of  course,  the  "strcmp"  code
      explicitly   checks   the  alignment  at  execution  time  and  branches
      accordingly).  A "Dhrystone switch" for the  compiler  that  causes  the
      generation  of  code  that  may  not work under certain circumstances is
      certainly inappropriate for comparisons. It has been reported in  Usenet
      that some C compilers provide such a compiler option; since I don't have
      access to all C compilers involved, I cannot verify this.

      If the fixed-length and word-alignment assumption can be  used,  a  wide
      bus  that permits fast multi-word load instructions certainly does help;
      however, this fact by itself should not make a really big difference.

A check of  these  points  -  something  that  is  necessary  for  a  thorough
evaluation  and  comparison  of  the  Dhrystone  performance claims - requires
object code listings as well as listings for  the  string  functions  (strcpy,
strcmp) that are possibly called by the program.

I don't pretend that Dhrystone is  a  perfect  tool  to  measure  the  integer
performance  of microprocessors. The more it is used and discussed, the more I
myself learn about aspects that I hadn't noticed yet when I wrote the program.
And  of  course,  the  very success of a benchmark program is a danger in that
people may tune their compilers and/or hardware to it, and  with  this  action
make it less useful.

Whetstone and Linpack have their critical points also:  The  Whetstone  rating
depends  heavily on the speed of the mathematical functions (sine, sqrt, ...),
and Linpack is sensitive to data alignment for some cache configurations.

Introduction of a standard set of public domain benchmark software  (something
the  SPEC  effort attempts) is certainly a worthwhile thing.  In the meantime,
people will continue to use whatever is available and widely distributed,  and
Dhrystone  ratings  are probably still better than MIPS ratings if these are -
as often in industry - based on  no  reproducible  derivation.   However,  any
serious  performance  evaluation  requires  more than just a comparison of raw
numbers; one has to make sure  that  the  numbers  have  been  obtained  in  a
comparable way.