diff options
Diffstat (limited to 'README.txt')
-rwxr-xr-x | README.txt | 167 |
1 files changed, 167 insertions, 0 deletions
diff --git a/README.txt b/README.txt new file mode 100755 index 0000000..7189a27 --- /dev/null +++ b/README.txt @@ -0,0 +1,167 @@ + +This is the README file for my program, "bandwidth". + +Bandwidth is a benchmark that attempts to measure +memory bandwidth. In December 2010 (and as of +release 0.24), I extended 'bandwidth' to measure +network bandwidth as well. + +Bandwidth is useful because both memory bandwidth +and network bandwidth need to be measured to +give you a clear idea of what your computer(s) can do. +Merely relying on specs does not give a full picture +and indeed specs can be misleading. + +-------------------------------------------------- +MEMORY BANDWIDTH + +My program bandwidth performs sequential and random +reads and writes of varying sizes. This permits +you to infer from the graph how each type of memory +is performing. So for instance when bandwidth +writes a 256-byte chunk, you know that because +caches are normally write-back, this chunk +will reside entirely in the L1 cache. Whereas +a 512 kB chunk will mainly reside in L2. + +You could run a non-artificial benchmark and +observe that a general performance number is lower +on one machine or higher on anotehr, but that may +conceal the cause. + +So the purpose of this program is to help you +pinpoint the cause of a performance problem, +or to affirm a general impression about a memory- +intensive program. + +It also tells you the best-case scenario e.g. +the maximum bandwidth achieved using sequential, +128-bit memory accesses. + +Release 1.1: + - Added larger font. +Release 1.0: + - Moved graphing into BMPGraphing module. + - Finally added LODS benchmarking, which + proves how badly lodsb/lodsw/lodsd/lodsq + perform. + - Added switches --faster and --fastest. +Release 0.32: + - Improved AVX support. +Release 0.31: + - Adds cache detection for Intel 32-bit CPUs + - Adds a little AVX support. + - Fixes vector-to/from-main transfer bugs. +Release 0.30 adds cache detection for Intel 64-bit CPUs. +Release 0.29 improved graph granularity with more + 128-byte tests and removes ARM support. +Release 0.28 added a proper test of CPU features e.g. SSE 4.1. +Release 0.27 added finer-granularity 128-byte tests. +Release 0.26 fixed an issue with AMD processors. +Release 0.25 maked network bandwidth bidirectional. +Release 0.24 added network bandwidth testing. + +Release 0.23 added: + - Mac OS/X 64-bit support. + - Vector-to-vector register transfer test. + - Main register to/from vector register transfer test. + - Main register byte/word/dword/qword to/from + vector register test (pinsr*, pextr* instructions). + - Memory copy test using SSE2. + - Automatic checks under Linux for SSE2 & SSE4. + +Release 0.22 added: + - Register-to-register transfer test. + - Register-to/from-stack transfer tests. + +Release 0.21 added: + - Standardized memory chunks to always be + a multiple of 256-byte mini-chunks. + - Random memory accesses, in which each + 256-byte mini-chunk accessed is accessed + in a random order, but also, inside each + mini-chunk the 32/64/128 data are accessed + pseudo-randomly as well. + - Now 'bandwidth' includes chunk sizes that + are not powers of 2, which increases + data points around the key chunk sizes + corresponding to common L1 and L2 cache + sizes. + - Command-line options: + --fast for 0.25 seconds per test. + --slow for 20 seconds per test. + --title for adding a graph title. + +Release 0.20 added graphing, with the graph +stored in a BMP image file. It also adds the +--slow option for more precise runs. + +Release 0.19 added a second 128-bit SSE writer +routine that bypasses the caches, in addition +to the one that doesn't. + +Release 0.18 was my Grand Unified bandwidth +benchmark that brought together support for +four operating systems: + - Linux + - Windows Mobile + - 32-bit Windows + - Mac OS/X 64-bit +and two processor architectures: + - x86 + - Intel64 +I've written custom assembly routines for +each architecture. + +Total run time for the default speed, which +has 5 seconds per test, is about 35 minutes. + +-------------------------------------------------- +NETWORK BANDWIDTH (beginning with release 0.24) + +In mid-December 2010, I extended bandwidth to measure +network bandwidth, which is useful for testing +your home or workplace network setup, and in theory +could be used to test machines across the Internet. + +Release 0.25 adds: + - Bidirectional network bandwidth testing. + - Specifiable port# (default is 49000). + +In the graph: + - Sent data appears as a solid line. + - Received data appears as a dashed line. + +The network test is pretty simple. It sends chunks +of data of varying sizes to whatever computers +(nodes) that you specify. Each of those must be +running 'bandwidth' in transponder mode. + +The chunks of data range of 32 kB up to 32 MB. +These are actually send as a stream of 1 or more +32 kB sub-chunks. + +Sample output: + output/Network-Linux2.6-Celeron-2.8GHz-32bit-loopback.bmp + output/Network-MacOSX32-Corei5-2.4GHz-64bit-loopback.bmp + output/Network-Mac64-Linux32.bmp + +How to start a transponder: + ./bandwidth-mac64 --transponder + +Example invocation of the test leader: + ./bandwidth64 --network 192.168.1.104 + +I've tested network mode on: + Linux 32-bit + Mac OS/X 32- and 64-bit + Win/Cygwin 32-bit. + +-------------------------------------------------- +This program is provided without any warranty +and AS-IS. See the file COPYING for details. + +Zack Smith +1@zsmith.co +March 2013 + |