From f2b214b0dff95d6bb79cbb5b6ff5ba9d90f655c9 Mon Sep 17 00:00:00 2001 From: oharboe Date: Wed, 2 Jan 2008 21:52:27 +0000 Subject: Initial import from www.ecosforge.net --- zpu/roadshow/roadshow/dhrystone/VARIATIONS | 157 +++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) create mode 100644 zpu/roadshow/roadshow/dhrystone/VARIATIONS (limited to 'zpu/roadshow/roadshow/dhrystone/VARIATIONS') diff --git a/zpu/roadshow/roadshow/dhrystone/VARIATIONS b/zpu/roadshow/roadshow/dhrystone/VARIATIONS new file mode 100644 index 0000000..3046cbd --- /dev/null +++ b/zpu/roadshow/roadshow/dhrystone/VARIATIONS @@ -0,0 +1,157 @@ + + Understanding Variations in Dhrystone Performance + + + + By Reinhold P. Weicker, Siemens AG, AUT E 51, Erlangen + + + + April 1989 + + + This article has appeared in: + + + Microprocessor Report, May 1989 (Editor: M. Slater), pp. 16-17 + + + + +Microprocessor manufacturers tend to credit all the performance measured by +benchmarks to the speed of their processors, they often don't even mention the +programming language and compiler used. In their detailed documents, usually +called "performance brief" or "performance report," they usually do give more +details. However, these details are often lost in the press releases and other +marketing statements. For serious performance evaluation, it is necessary to +study the code generated by the various compilers. + +Dhrystone was originally published in Ada (Communications of the ACM, Oct. +1984). However, since good Ada compilers were rare at this time and, together +with UNIX, C became more and more popular, the C version of Dhrystone is the +one now mainly used in industry. There are "official" versions 2.1 for Ada, +Pascal, and C, which are as close together as the languages' semantic +differences permit. + +Dhrystone contains two statements where the programming language and its +translation play a major part in the execution time measured by the benchmark: + + o String assignment (in procedure Proc_0 / main) + o String comparison (in function Func_2) + +In Ada and Pascal, strings are arrays of characters where the length of the +string is part of the type information known at compile time. In C, strings +are also arrays of characters, but there are no operators defined in the +language for assignment and comparison of strings. Instead, functions +"strcpy" and "strcmp" are used. These functions are defined for strings of +arbitrary length, and make use of the fact that strings in C have to end with +a terminating null byte. For general-purpose calls to these functions, the +implementor can assume nothing about the length and the alignment of the +strings involved. + +The C version of Dhrystone spends a relatively large amount of time in these +two functions. Some time ago, I made measurements on a VAX 11/785 with the +Berkeley UNIX (4.2) compilers (often-used compilers, but certainly not the +most advanced). In the C version, 23% of the time was spent in the string +functions; in the Pascal version, only 10%. On good RISC machines (where less +time is spent in the procedure calling sequence than on a VAX) and with better +optimizing compilers, the percentage is higher; MIPS has reported 34% for an +R3000. Because of this effect, Pascal and Ada Dhrystone results are usually +better than C results (except when the optimization quality of the C compiler +is considerably better than that of the other compilers). + +Several people have noted that the string operations are over-represented in +Dhrystone, mainly because the strings occurring in Dhrystone are longer than +average strings. I admit that this is true, and have said so in my SIGPLAN +Notices paper (Aug. 1988); however, I didn't want to generate confusion by +changing the string lengths from version 1 to version 2. + +Even if they are somewhat over-represented in Dhrystone, string operations are +frequent enough that it makes sense to implement them in the most efficient +way possible, not only for benchmarking purposes. This means that they can +and should be written in assembly language code. ANSI C also explicitly allows +the strings functions to be implemented as macros, i.e. by inline code. + +There is also a third way to speed up the "strcpy" statement in Dhrystone: For +this particular "strcpy" statement, the source of the assignment is a string +constant. Therefore, in contrast to calls to "strcpy" in the general case, the +compiler knows the length and alignment of the strings involved at compile +time and can generate code in the same efficient way as a Pascal compiler +(word instructions instead of byte instructions). + +This is not allowed in the case of the "strcmp" call: Here, the addresses are +formal procedure parameters, and no assumptions can be made about the length +or alignment of the strings. Any such assumptions would indicate an incorrect +implementation. They might work for Dhrystone, where the strings are in fact +word-aligned with typical compilers, but other programs would deliver +incorrect results. + +So, for an apple-to-apple comparison between processors, and not between +several possible (legal or illegal) degrees of compiler optimization, one +should check that the systems are comparable with respect to the following +three points: + + (1) String functions in assembly language vs. in C + + Frequently used functions such as the string functions can and should be + written in assembly language, and all serious C language systems known + to me do this. (I list this point for completeness only.) Note that + processors with an instruction that checks a word for a null byte (such + as AMD's 29000 and Intel's 80960) have an advantage here. (This + advantage decreases relatively if optimization (3) is applied.) Due to + the length of the strings involved in Dhrystone, this advantage may be + considered too high in perspective, but it is certainly legal to use + such instructions - after all, these situations are what they were + invented for. + + (2) String function code inline vs. as library functions. + + ANSI C has created a new situation, compared with the older + Kernighan/Ritchie C. In the original C, the definition of the string + function was not part of the language. Now it is, and inlining is + explicitly allowed. I probably should have stated more clearly in my + SIGPLAN Notices paper that the rule "No procedure inlining for + Dhrystone" referred to the user level procedures only and not to the + library routines. + + (3) Fixed-length and alignment assumptions for the strings + + Compilers should be allowed to optimize in these cases if (and only if) + it is safe to do so. For Dhrystone, this is the "strcpy" statement, but + not the "strcmp" statement (unless, of course, the "strcmp" code + explicitly checks the alignment at execution time and branches + accordingly). A "Dhrystone switch" for the compiler that causes the + generation of code that may not work under certain circumstances is + certainly inappropriate for comparisons. It has been reported in Usenet + that some C compilers provide such a compiler option; since I don't have + access to all C compilers involved, I cannot verify this. + + If the fixed-length and word-alignment assumption can be used, a wide + bus that permits fast multi-word load instructions certainly does help; + however, this fact by itself should not make a really big difference. + +A check of these points - something that is necessary for a thorough +evaluation and comparison of the Dhrystone performance claims - requires +object code listings as well as listings for the string functions (strcpy, +strcmp) that are possibly called by the program. + +I don't pretend that Dhrystone is a perfect tool to measure the integer +performance of microprocessors. The more it is used and discussed, the more I +myself learn about aspects that I hadn't noticed yet when I wrote the program. +And of course, the very success of a benchmark program is a danger in that +people may tune their compilers and/or hardware to it, and with this action +make it less useful. + +Whetstone and Linpack have their critical points also: The Whetstone rating +depends heavily on the speed of the mathematical functions (sine, sqrt, ...), +and Linpack is sensitive to data alignment for some cache configurations. + +Introduction of a standard set of public domain benchmark software (something +the SPEC effort attempts) is certainly a worthwhile thing. In the meantime, +people will continue to use whatever is available and widely distributed, and +Dhrystone ratings are probably still better than MIPS ratings if these are - +as often in industry - based on no reproducible derivation. However, any +serious performance evaluation requires more than just a comparison of raw +numbers; one has to make sure that the numbers have been obtained in a +comparable way. + -- cgit v1.1