summaryrefslogtreecommitdiffstats
path: root/README.txt
diff options
context:
space:
mode:
Diffstat (limited to 'README.txt')
-rwxr-xr-xREADME.txt167
1 files changed, 167 insertions, 0 deletions
diff --git a/README.txt b/README.txt
new file mode 100755
index 0000000..7189a27
--- /dev/null
+++ b/README.txt
@@ -0,0 +1,167 @@
+
+This is the README file for my program, "bandwidth".
+
+Bandwidth is a benchmark that attempts to measure
+memory bandwidth. In December 2010 (and as of
+release 0.24), I extended 'bandwidth' to measure
+network bandwidth as well.
+
+Bandwidth is useful because both memory bandwidth
+and network bandwidth need to be measured to
+give you a clear idea of what your computer(s) can do.
+Merely relying on specs does not give a full picture
+and indeed specs can be misleading.
+
+--------------------------------------------------
+MEMORY BANDWIDTH
+
+My program bandwidth performs sequential and random
+reads and writes of varying sizes. This permits
+you to infer from the graph how each type of memory
+is performing. So for instance when bandwidth
+writes a 256-byte chunk, you know that because
+caches are normally write-back, this chunk
+will reside entirely in the L1 cache. Whereas
+a 512 kB chunk will mainly reside in L2.
+
+You could run a non-artificial benchmark and
+observe that a general performance number is lower
+on one machine or higher on anotehr, but that may
+conceal the cause.
+
+So the purpose of this program is to help you
+pinpoint the cause of a performance problem,
+or to affirm a general impression about a memory-
+intensive program.
+
+It also tells you the best-case scenario e.g.
+the maximum bandwidth achieved using sequential,
+128-bit memory accesses.
+
+Release 1.1:
+ - Added larger font.
+Release 1.0:
+ - Moved graphing into BMPGraphing module.
+ - Finally added LODS benchmarking, which
+ proves how badly lodsb/lodsw/lodsd/lodsq
+ perform.
+ - Added switches --faster and --fastest.
+Release 0.32:
+ - Improved AVX support.
+Release 0.31:
+ - Adds cache detection for Intel 32-bit CPUs
+ - Adds a little AVX support.
+ - Fixes vector-to/from-main transfer bugs.
+Release 0.30 adds cache detection for Intel 64-bit CPUs.
+Release 0.29 improved graph granularity with more
+ 128-byte tests and removes ARM support.
+Release 0.28 added a proper test of CPU features e.g. SSE 4.1.
+Release 0.27 added finer-granularity 128-byte tests.
+Release 0.26 fixed an issue with AMD processors.
+Release 0.25 maked network bandwidth bidirectional.
+Release 0.24 added network bandwidth testing.
+
+Release 0.23 added:
+ - Mac OS/X 64-bit support.
+ - Vector-to-vector register transfer test.
+ - Main register to/from vector register transfer test.
+ - Main register byte/word/dword/qword to/from
+ vector register test (pinsr*, pextr* instructions).
+ - Memory copy test using SSE2.
+ - Automatic checks under Linux for SSE2 & SSE4.
+
+Release 0.22 added:
+ - Register-to-register transfer test.
+ - Register-to/from-stack transfer tests.
+
+Release 0.21 added:
+ - Standardized memory chunks to always be
+ a multiple of 256-byte mini-chunks.
+ - Random memory accesses, in which each
+ 256-byte mini-chunk accessed is accessed
+ in a random order, but also, inside each
+ mini-chunk the 32/64/128 data are accessed
+ pseudo-randomly as well.
+ - Now 'bandwidth' includes chunk sizes that
+ are not powers of 2, which increases
+ data points around the key chunk sizes
+ corresponding to common L1 and L2 cache
+ sizes.
+ - Command-line options:
+ --fast for 0.25 seconds per test.
+ --slow for 20 seconds per test.
+ --title for adding a graph title.
+
+Release 0.20 added graphing, with the graph
+stored in a BMP image file. It also adds the
+--slow option for more precise runs.
+
+Release 0.19 added a second 128-bit SSE writer
+routine that bypasses the caches, in addition
+to the one that doesn't.
+
+Release 0.18 was my Grand Unified bandwidth
+benchmark that brought together support for
+four operating systems:
+ - Linux
+ - Windows Mobile
+ - 32-bit Windows
+ - Mac OS/X 64-bit
+and two processor architectures:
+ - x86
+ - Intel64
+I've written custom assembly routines for
+each architecture.
+
+Total run time for the default speed, which
+has 5 seconds per test, is about 35 minutes.
+
+--------------------------------------------------
+NETWORK BANDWIDTH (beginning with release 0.24)
+
+In mid-December 2010, I extended bandwidth to measure
+network bandwidth, which is useful for testing
+your home or workplace network setup, and in theory
+could be used to test machines across the Internet.
+
+Release 0.25 adds:
+ - Bidirectional network bandwidth testing.
+ - Specifiable port# (default is 49000).
+
+In the graph:
+ - Sent data appears as a solid line.
+ - Received data appears as a dashed line.
+
+The network test is pretty simple. It sends chunks
+of data of varying sizes to whatever computers
+(nodes) that you specify. Each of those must be
+running 'bandwidth' in transponder mode.
+
+The chunks of data range of 32 kB up to 32 MB.
+These are actually send as a stream of 1 or more
+32 kB sub-chunks.
+
+Sample output:
+ output/Network-Linux2.6-Celeron-2.8GHz-32bit-loopback.bmp
+ output/Network-MacOSX32-Corei5-2.4GHz-64bit-loopback.bmp
+ output/Network-Mac64-Linux32.bmp
+
+How to start a transponder:
+ ./bandwidth-mac64 --transponder
+
+Example invocation of the test leader:
+ ./bandwidth64 --network 192.168.1.104
+
+I've tested network mode on:
+ Linux 32-bit
+ Mac OS/X 32- and 64-bit
+ Win/Cygwin 32-bit.
+
+--------------------------------------------------
+This program is provided without any warranty
+and AS-IS. See the file COPYING for details.
+
+Zack Smith
+1@zsmith.co
+March 2013
+
OpenPOWER on IntegriCloud