summaryrefslogtreecommitdiffstats
path: root/zpu/hdl/zpu3/src/status.txt
blob: df8773a5ad6f8c327e469d1713ec9fc8a5c1954c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
- Make LOADSP/STORESP/ADDSP/PUSHPC & OR emulated => From 444 => 428 LUT.
  A pitiful saving in return for destroying performance.
- If I reduce datapath to 8(which is useless) => 197 LUT.

Bare bones version of ZPU3:

- remove NOP, PUSHPC, STORESP, LOADSP, ADDSP and OR instructions. This requires
  modification to the GCC toolchain and will result in a fairly significant 
  code increase. We should still do better than ARM though.
- reduce datapath to 16 bits. This will reduce stack usage, which is good.
- 4kBytes of RAM.

     [exec] =========================================================================
     [exec] Device utilization summary:
     [exec] ---------------------------
     [exec] Selected Device : 3s400ft256-4
     [exec] Number of Slices:                     167  out of   3584     4%
     [exec] Number of Slice Flip Flops:           126  out of   7168     1%
     [exec] Number of 4 input LUTs:               288  out of   7168     4%
     [exec] Number of bonded IOBs:                 49  out of    173    28%
     [exec] Number of BRAMs:                        1  out of     16     6%
     [exec] Number of GCLKs:                        1  out of      8    12%
     [exec] =========================================================================




Measurements:

- Removing PUSHPC(which is possible) reduces usage by 2 LUT's.
- I tried to introduce the instructions as seperate states at the top level,
  but did not succeed in reducing LUT count. This might be an avenue to
  pursue if asynchronous(?) ROM's could replace logic.
- 550 LUT @ 76MHz. 32 bit datapath & 8 bit instructions. Added seperate decode
  stage.
- Tried to move memAControl into decoded opcode. Usage went up to 594 from 550.

- using 16 bit opcodes to encode signals directly. 466 LUT's.
- w/2kBytes 32 RAM & 32 bit opcodes. 415 LUT's.
- 16 bit opcode, 16 bit datapath and 1kbyte RAM. 292 LUT's.

- 725 LUT's @ 63MHz
	 Minimum period:  15.909ns{1}  (Maximum frequency:  62.858MHz)
- removed addsp, loadsp & storesp.  => 670 LUT's.
- removed all pushes & pops to sp. => 638 LUT's.
- removed OR instruction. => 672 LUT's.
- on the second cycle an ADD is done regardless => 713 LUT's.
- using others => 'x' for e.g. pushsp. 713 => 703.
- switching from lots of prioritized if() for decoding instruction to a case
  statement. 713 => 631.
- Using ZPU1's memory scheme instead of inferred memory. 713 => 715, i.e. no
  difference.
- Removing AddSP. 715 => 704 LUT's.
- Add COMPARE. 715 => 743 LUT's.
- Slight reorganization of binary operand & NOP 715 => 704. 
- STORE only pops 1 (which can be fixed in the assembler). 704 => 701.
- Remove NOP. NOP is only used to clear idim_flag. Use NOT instead.
- Removing FLIP. 681 => 646. Using a different way to generate the FLIP,
  681 => 679.
- Add a seperate memory system for code?
- Use IDIM_FLAG to cache value before IM and make add single cycle.

- by expanding the opcode to 32 bits, encoding everything in the opcode &
  using case statements. 713 => 433 LUT.
- 32 bit opcode w/encoded state & 16 bit datapath. => 325 LUT
- by using 512 byte RAM, 16 bit datapath and 32 bit instructions => 285.

OpenPOWER on IntegriCloud