summaryrefslogtreecommitdiffstats
path: root/contrib/libgmp/SPEED
blob: e888e17e5b42a7798a16f510e194ef64e427816d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
==============================================================================
Cycle counts and throughput for low-level routines in GNU MP as currently
implemented.

A range means that the timing is data-dependent.  The slower number of such
an interval is usually the best performance estimate.

The throughput value, measured in Gb/s (gigabits per second) has a meaning
only for comparison between CPUs.

A star before a line means that all values on that line are estimates.  A
star before a number means that that number is an estimate.  A `p' before a
number means that the code is not complete, but the timing is believed to be
accurate.

	    |	mpn_lshift	mpn_add_n	mpn_mul_1	mpn_addmul_1
	    |	mpn_rshift	mpn_sub_n			mpn_submul_1
------------+-----------------------------------------------------------------
DEC/Alpha   |
EV4	    |	4.75 cycles/64b	7.75 cycles/64b	42 cycles/64b	42 cycles/64b
  200MHz    |	2.7 Gb/s	1.65 Gb/s	20 Gb/s		20 Gb/s
EV5 old code|	4.0 cycles/64b	5.5 cycles/64b	18 cycles/64b	18 cycles/64b
  267MHz    |	4.27 Gb/s	3.10 Gb/s	61 Gb/s		61 Gb/s
  417MHz    |	6.67 Gb/s	4.85 Gb/s	95 Gb/s		95 Gb/s
EV5 tuned   |	3.25 cycles/64b	4.75 cycles/64b
  267MHz    |	5.25 Gb/s	3.59 Gb/s		as above
  417MHz    |	8.21 Gb/s	5.61 Gb/s
------------+-----------------------------------------------------------------
Sun/SPARC   |
SPARC v7    |	14.0 cycles/32b	8.5 cycles/32b	37-54 cycl/32b	37-54 cycl/32b
SuperSPARC  |	3 cycles/32b	2.5 cycles/32b	8.2 cycles/32b	10.8 cycles/32b
  50MHz	    |	0.53 Gb/s	0.64 Gb/s	6.2 Gb/s	4.7 Gb/s
**SuperSPARC|		tuned addmul and submul will take:	9.25 cycles/32b
MicroSPARC2 |	?		6.65 cycles/32b	30 cycles/32b	31.5 cycles/32b
  110MHz    |	?		0.53 Gb/s	3.75 Gb/s	3.58 Gb/s
SuperSPARC2 |	?		?		?		?
Ultra/32 (4)|	2.5 cycles/32b	6.5 cycles/32b	13-27 cyc/32b	16-30 cyc/32b
  182MHz    |	2.33 Gb/s	0.896 Gb/s	14.3-6.9 Gb/s
Ultra/64 (5)|	2.5 cycles/64b	10 cycles/64b	40-70 cyc/64b	46-76 cyc/64b
  182MHz    |	4.66 Gb/s	1.16 Gb/s	18.6-11 Gb/s
HalSPARC64  |	?		?		?		?
------------+-----------------------------------------------------------------
SGI/MIPS    |
R3000	    |	6 cycles/32b	9.25 cycles/32b	16 cycles/32b	16 cycles/32b
  40MHz     |	0.21 Gb/s	0.14 Gb/s	2.56 Gb/s	2.56 Gb/s
R4400/32    |	8.6 cycles/32b	10 cycles/32b	16-18		19-21
  200MHz    |	0.74 Gb/s	0.64 Gb/s	13-11 Gb/s	11-9.6 Gb/s
*R4400/64   |	8.6 cycles/64b	10 cycles/64b	22 cycles/64b	22 cycles/64b
  *200MHz   |	1.48 Gb/s	1.28 Gb/s	37 Gb/s		37 Gb/s
R4600/32    |	6 cycles/64b	9.25 cycles/32b	15 cycles/32b	19 cycles/32b
  134MHz    |	0.71 Gb/s	0.46 Gb/s	9.1 Gb/s	7.2 Gb/s
R4600/64    |	6 cycles/64b	9.25 cycles/64b	?		?
  134MHz    |	1.4 Gb/s	0.93 Gb/s	?		?
R8000/64    |	3 cycles/64b	4.6 cycles/64b	8 cycles/64b	8 cycles/64b
  75MHz	    |	1.6 Gb/s	1.0 Gb/s	38 Gb/s		38 Gb/s
*R10000/64  |	2 cycles/64b	3 cycles/64b	11 cycles/64b	11 cycles/64b
  *200MHz   |	6.4 Gb/s	4.27 Gb/s	74 Gb/s		74 Gb/s
  *250MHz   |	8.0 Gb/s	5.33 Gb/s	93 Gb/s		93 Gb/s
------------+-----------------------------------------------------------------
Motorola    |
MC68020     |	?		24 cycles/32b	62 cycles/32b	70 cycles/32b
MC68040     |	?		6 cycles/32b	24 cycles/32b	25 cycles/32b
MC88100	    |	>5 cycles/32b	4.6 cycles/32b	16/21 cyc/32b	p 18/23 cyc/32b
MC88110  wt |	?		3.75 cycles/32b	6 cycles/32b	8.5 cyc/32b
*MC88110 wb |	?		2.25 cycles/32b	4 cycles/32b	5 cycles/32b
------------+-----------------------------------------------------------------
HP/PA-RISC  |
PA7000	    |	4 cycles/32b	5 cycles/32b	9 cycles/32b	11 cycles/32b
  67MHz	    |	0.53 Gb/s	0.43 Gb/s	7.6 Gb/s	6.2 Gb/s
PA7100	    |	3.25 cycles/32b	4.25 cycles/32b	7 cycles/32b	8 cycles/32b
  99MHz	    |	0.97 Gb/s	0.75 Gb/s	14 Gb/s		12.8 Gb/s
PA7100LC    |	?		?		?		?
PA7200  (3) |	3 cycles/32b	4 cycles/32b	7 cycles/32b	6.5 cycles/32b
  100MHz    |	1.07 Gb/s	0.80		14 Gb/s		15.8 Gb/s
PA7300LC    |	?		?		?		?
*PA8000	    |	3 cycles/64b	4 cycles/64b	7 cycles/64b	6.5 cycles/64b
  180MHz    |	3.84 Gb/s	2.88 Gb/s	105 Gb/s	113 Gb/s
------------+-----------------------------------------------------------------
Intel/x86   |
386DX	    |	20 cycles/32b	17 cycles/32b	41-70 cycl/32b	50-79 cycl/32b
  16.7MHz   |	0.027 Gb/s	0.031 Gb/s	0.42-0.24 Gb/s	0.34-0.22 Gb/s
486DX	    |	?		?		?		?
486DX4	    |	9.5 cycles/32b	9.25 cycles/32b	17-23 cycl/32b	20-26 cycl/32b
  100MHz    |	0.34 Gb/s	0.35 Gb/s	6.0-4.5 Gb/s	5.1-3.9 Gb/s
Pentium     |	2/6 cycles/32b	2.5 cycles/32b	13 cycles/32b	14 cycles/32b
  167MHz    |	2.7/0.89 Gb/s	2.1 Gb/s	13.1 Gb/s	12.2 Gb/s
Pentium Pro |	2.5 cycles/32b	3.5 cycles/32b	6 cycles/32b	9 cycles/32b
  200MHz    |	2.6 Gb/s	1.8 Gb/s	34 Gb/s		23 Gb/s
------------+-----------------------------------------------------------------
IBM/POWER   |
RIOS 1	    |	3 cycles/32b	4 cycles/32b	11.5-12.5 c/32b	14.5/15.5 c/32b
RIOS 2	    |	2 cycles/32b	2 cycles/32b	7 cycles/32b	8.5 cycles/32b
------------+-----------------------------------------------------------------
PowerPC	    |
PPC601  (1) |	3 cycles/32b	6 cycles/32b	11-16 cycl/32b	14-19 cycl/32b
PPC601  (2) |	5 cycles/32b	6 cycles/32b	13-22 cycl/32b	16-25 cycl/32b
  67MHz (2) |	0.43 Gb/s	0.36 Gb/s	5.3-3.0 Gb/s	4.3-2.7 Gb/s
PPC603	    |	?		?		?		?
*PPC604	    |	2		3		2		3
  *167MHz   |							57 Gb/s
PPC620	    |	?		?		?		?
------------+-----------------------------------------------------------------
Tege	    |
Model 1	    |	2 cycles/64b	3 cycles/64b	2 cycles/64b	3 cycles/64b
  250MHz    |	8 Gb/s		5.3 Gb/s	500 Gb/s	340 Gb/s
  500MHz    |	16 Gb/s		11 Gb/s		1000 Gb/s	680 Gb/s
____________|_________________________________________________________________
(1) Using POWER and PowerPC instructions
(2) Using only PowerPC instructions
(3) Actual timing for shift/add/sub depends on code alignment.  PA7000 code
    is smaller and therefore often faster on this CPU.
(4) Multiplication routines modified for bogus UltraSPARC early-out
    optimization.  Smaller operand is put in rs1, not rs2 as it should
    according to the SPARC architecture manuals.
(5) Preliminary timings, since there is no stable 64-bit environment.
(6) Use mulu.d at least for mpn_lshift.  With mak/extu/or, we can only get
    to 2 cycles/32b.

=============================================================================
Estimated theoretical asymptotic cycle counts for low-level routines:

	    |	mpn_lshift	mpn_add_n	mpn_mul_1	mpn_addmul_1
	    |	mpn_rshift	mpn_sub_n			mpn_submul_1
------------+-----------------------------------------------------------------
DEC/Alpha   |
EV4	    |	3 cycles/64b	5 cycles/64b	42 cycles/64b	42 cycles/64b
EV5	    |	3 cycles/64b	4 cycles/64b	18 cycles/64b	18 cycles/64b
------------+-----------------------------------------------------------------
Sun/SPARC   |
SuperSPARC  |	2.5 cycles/32b	2 cycles/32b	8 cycles/32b	9 cycles/32b
------------+-----------------------------------------------------------------
SGI/MIPS    |
R4400/32    |	5 cycles/64b	8 cycles/64b	16 cycles/64b	16 cycles/64b
R4400/64    |	5 cycles/64b	8 cycles/64b	22 cycles/64b	22 cycles/64b
R4600	    |
------------+-----------------------------------------------------------------
HP/PA-RISC  |
PA7100	    |	3 cycles/32b	4 cycles/32b	6.5 cycles/32b	7.5 cycles/32b
PA7100LC    |
------------+-----------------------------------------------------------------
Motorola    |
MC88110	    |	1.5 cyc/32b (6)	1.5 cycle/32b	1.5 cycles/32b	2.25 cycles/32b
------------+-----------------------------------------------------------------
Intel/x86   |
486DX4	    |
Pentium P5x |	5 cycles/32b	2 cycles/32b	11.5 cycles/32b	13 cycles/32b
Pentium Pro |	2 cycles/32b	3 cycles/32b	4 cycles/32b	6 cycles/32b
------------+-----------------------------------------------------------------
IBM/POWER   |
RIOS 1	    |	3 cycles/32b	4 cycles/32b
RIOS 2	    |	1.5 cycles/32b	2 cycles/32b	4.5 cycles/32b	5.5 cycles/32b
------------+-----------------------------------------------------------------
PowerPC	    |
PPC601  (1) |	3 cycles/32b	?4 cycles/32b
PPC601  (2) |	4 cycles/32b	?4 cycles/32b
____________|_________________________________________________________________
OpenPOWER on IntegriCloud