summaryrefslogtreecommitdiffstats
path: root/README.txt
blob: 7189a27eace51b22bcd2573889a949e9c9c3e04b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167

This is the README file for my program, "bandwidth".

Bandwidth is a benchmark that attempts to measure
memory bandwidth. In December 2010 (and as of
release 0.24), I extended 'bandwidth' to measure 
network bandwidth as well.

Bandwidth is useful because both memory bandwidth
and network bandwidth need to be measured to
give you a clear idea of what your computer(s) can do.
Merely relying on specs does not give a full picture
and indeed specs can be misleading.

--------------------------------------------------
MEMORY BANDWIDTH 

My program bandwidth performs sequential and random
reads and writes of varying sizes. This permits 
you to infer from the graph how each type of memory 
is performing. So for instance when bandwidth
writes a 256-byte chunk, you know that because
caches are normally write-back, this chunk
will reside entirely in the L1 cache. Whereas
a 512 kB chunk will mainly reside in L2.

You could run a non-artificial benchmark and 
observe that a general performance number is lower 
on one machine or higher on anotehr, but that may
conceal the cause. 

So the purpose of this program is to help you 
pinpoint the cause of a performance problem,
or to affirm a general impression about a memory-
intensive program. 

It also tells you the best-case scenario e.g.
the maximum bandwidth achieved using sequential,
128-bit memory accesses.

Release 1.1:
	- Added larger font.
Release 1.0:
	- Moved graphing into BMPGraphing module.
	- Finally added LODS benchmarking, which
	  proves how badly lodsb/lodsw/lodsd/lodsq
	  perform.
	- Added switches --faster and --fastest.
Release 0.32:
	- Improved AVX support.
Release 0.31:
	- Adds cache detection for Intel 32-bit CPUs
	- Adds a little AVX support.
	- Fixes vector-to/from-main transfer bugs.
Release 0.30 adds cache detection for Intel 64-bit CPUs.
Release 0.29 improved graph granularity with more
	128-byte tests and removes ARM support.
Release 0.28 added a proper test of CPU features e.g. SSE 4.1.
Release 0.27 added finer-granularity 128-byte tests.
Release 0.26 fixed an issue with AMD processors.
Release 0.25 maked network bandwidth bidirectional.
Release 0.24 added network bandwidth testing.

Release 0.23 added:
	- Mac OS/X 64-bit support.
	- Vector-to-vector register transfer test.
	- Main register to/from vector register transfer test.
	- Main register byte/word/dword/qword to/from 
	  vector register test (pinsr*, pextr* instructions).
	- Memory copy test using SSE2.
	- Automatic checks under Linux for SSE2 & SSE4.

Release 0.22 added:
	- Register-to-register transfer test.
	- Register-to/from-stack transfer tests.

Release 0.21 added:
	- Standardized memory chunks to always be
	  a multiple of 256-byte mini-chunks.
	- Random memory accesses, in which each 
	  256-byte mini-chunk accessed is accessed 
	  in a random order, but also, inside each 
	  mini-chunk the 32/64/128 data are accessed
	  pseudo-randomly as well. 
	- Now 'bandwidth' includes chunk sizes that 
	  are not powers of 2, which increases 
	  data points around the key chunk sizes 
	  corresponding to common L1 and L2 cache 
	  sizes.
	- Command-line options:
		--fast for 0.25 seconds per test.
		--slow for 20 seconds per test.
		--title for adding a graph title.

Release 0.20 added graphing, with the graph
stored in a BMP image file. It also adds the
--slow option for more precise runs.

Release 0.19 added a second 128-bit SSE writer
routine that bypasses the caches, in addition
to the one that doesn't.

Release 0.18 was my Grand Unified bandwidth
benchmark that brought together support for
four operating systems:
	- Linux
	- Windows Mobile
	- 32-bit Windows
	- Mac OS/X 64-bit
and two processor architectures:
	- x86
	- Intel64
I've written custom assembly routines for
each architecture.

Total run time for the default speed, which
has 5 seconds per test, is about 35 minutes.

--------------------------------------------------
NETWORK BANDWIDTH (beginning with release 0.24)

In mid-December 2010, I extended bandwidth to measure
network bandwidth, which is useful for testing
your home or workplace network setup, and in theory
could be used to test machines across the Internet.

Release 0.25 adds:
	- Bidirectional network bandwidth testing.
	- Specifiable port# (default is 49000).

In the graph:
	- Sent data appears as a solid line.
	- Received data appears as a dashed line.

The network test is pretty simple. It sends chunks
of data of varying sizes to whatever computers
(nodes) that you specify. Each of those must be
running 'bandwidth' in transponder mode.

The chunks of data range of 32 kB up to 32 MB.
These are actually send as a stream of 1 or more
32 kB sub-chunks.

Sample output:
	output/Network-Linux2.6-Celeron-2.8GHz-32bit-loopback.bmp
	output/Network-MacOSX32-Corei5-2.4GHz-64bit-loopback.bmp
	output/Network-Mac64-Linux32.bmp

How to start a transponder:
	./bandwidth-mac64 --transponder

Example invocation of the test leader:
	./bandwidth64 --network 192.168.1.104

I've tested network mode on:
	Linux 32-bit
	Mac OS/X 32- and 64-bit
	Win/Cygwin 32-bit.

--------------------------------------------------
This program is provided without any warranty
and AS-IS. See the file COPYING for details.

Zack Smith
1@zsmith.co
March 2013

OpenPOWER on IntegriCloud