summaryrefslogtreecommitdiffstats
path: root/zpu/docs/zpu_arch.html
blob: 0c57caead4f1a84cbaeee45a7ab3dd651e2bbbe8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
<html>
<body>
<h1>Latest version of this document</h1>
This is a snapshot of the zpu_arch.html document in CVS. Please check out
the latest version from CVS to get the latest version.
<p>
$id$
<h1>Index</h1>
<ul>
<li> <a href="#introduction">Introduction</a>
<li> <a href="#download">Download</a>
<li> <a href="#patch">Creating a patch</a>
<li> <a href="#mailinglist">Getting help - mailing list</a>
<li> <a href="#fpgastarted">Getting started - FPGA</a>
<li> <a href="#swstarted">Getting started - software</a>
<li> <a href="#introduction">Architecture introduction</a>
<li> <a href="#instructionset">Instruction set</a>
<li> <a href="#startup">Custom startup code (aka crt0.s)</a>
<li> <a href="#implementing">Implementing your own ZPU</a>
<li> <a href="#vectors">Jump vectors</a>
<li> <a href="#memorymap">Memory map</a>
<li> <a href="#interrupts">Interrupts</a>
<li> <a href="#performance">Speeding up the ZPU</a>
<li> <a href="#debuguart">Debug channel / UART</a>
<li> <a href="#wishbone">Wishbone</a>
<li> <a href="#hwdebugger">JTAG/hardware debugger for GDB</a>
<li> <a href="#zpu_core_small.vhd">About zpu_core_small.vhd</a>
<li> <a href="#zpu_core.vhd">About zpu_core.vhd</a>
<li> <a href="#zealot">Zealot: Implementing in FPGAs</a>
<li> <a href="#codesize">Optimizing for code size</a>
<li> <a href="#ecos">Installing eCos build tools</a>
<li> <a href="#spicontroller">SPI flash controller</a>


<li> <a href="#nextgen">Next generation ZPU</a>

</ul>

<a name="introduction"/>
<P><FONT SIZE=4><B>The worlds smallest 32 bit CPU with GCC toolchain</B></FONT>
</P>
<P>This CPU is finding a new home at www.opencores.org, please
contact me if you are willing and able to help in shaping up the
www.opencores.org pages. 
</P>
<P>The HDL, GCC toolchain and eCos HAL are actually done. Mainly I
could need a hand with writing up docs/web pages/examples/bug
reports.</P>
<P>The ZPU has a BSD license for the HDL and GPL for the rest(source
files are sadly out of date here, patches gladly accepted!). This
allows deployments to implement any version of the ZPU they want
without running into commercial problems, but if improvements are
done to the architecture as such, then they need to be contributed
back. 
</P>
<P>One strength of the ZPU is that it is tiny and therefore easy to
implement from scratch to suit specialized needs and optimizations.</P>
<P>Currently there exists some pages at <A HREF="http://www.zylin.com/zpu.htm">http://www.zylin.com/zpu.htm</A>
that explains about the ZPU. According to OpenCores policy this
information should be moved to www.opencores.org. Patches gratefully
accepted to do so!</P>
<P>Per Jan 1. 2008, Zylin has the Copyright for the ZPU, i.e. Zylin
is free to decide that the ZPU shall have a BSD license for HDL + GPL
for the rest.</P>
<P>Sincerley,</P>
<P>&Oslash;yvind Harboe <BR>Zylin AS 
</P>
<P><FONT SIZE=4><B>Features</B></FONT> 
</P>
<UL>
	<LI><P STYLE="margin-bottom: 0in">Small size: 442 LUT @ 95 MHz after
	P&amp;R w/32 bit datapath Xilinx XC3S400 
	</P>
	<LI><P STYLE="margin-bottom: 0in">Wishbone 
	</P>
	<LI><P STYLE="margin-bottom: 0in">Code size 80% of ARM Thumb 
	</P>
	<LI><P STYLE="margin-bottom: 0in">GCC toolchain(GDB, newlib,
	libstdc+) 
	</P>
	<LI><P>eCos embedded operating system support</P>
</UL>
<P><FONT SIZE=4><B>Survey</B></FONT> 
</P>
<P>Please take the time to fill in this short survey so we can gather
information about where the ZPU can be the most useful:</P>
<P><A HREF="http://www.zylin.com/zpusurvey.html">http://www.zylin.com/zpusurvey.html</A></P>
<P><FONT SIZE=4><B>Status</B></FONT> 
</P>
<UL>
	<LI><P STYLE="margin-bottom: 0in">HDL works 
	</P>
	<LI><P STYLE="margin-bottom: 0in">GCC toolchain works 
	</P>
	<LI><P STYLE="margin-bottom: 0in">eCos HAL works, but could be less
	RAM hungry 
	</P>
	<LI><P STYLE="margin-bottom: 0in">The main problem at this point is
	not usage of the CPU, but that the documentation/CVS layout needs
	attention 
	</P>
	<LI><P STYLE="margin-bottom: 0in">Needs GDB stub support in eCos 
	</P>
	<LI><P>Could do with a Verilog implementation(ca. 600 lines to
	translate)</P>
</UL>
<P><FONT SIZE=4><B>Simulator</B></FONT> 
</P>
<P>The ZPU simulator is integrated into the Zylin Embedded CDT plugin
to ease debugging of ZPU applications:</P>
<P><A HREF="http://www.zylin.com/embeddedcdt.html">http://www.zylin.com/embeddedcdt.html</A></P>
<P>The ZPU simulator has many features besides debugging an
application:</P>
<UL>
	<LI><P STYLE="margin-bottom: 0in">taking output from simulation(e.g.
	ModelSim) and matching that against the Java simulator, thus making
	it much easier to debug HDL implementations and also getting real
	world timing information 
	</P>
	<LI><P STYLE="margin-bottom: 0in">can generate gprof output 
	</P>
	<LI><P>generate various statistics 
	</P>
</UL>
<P>The plugin is still pretty rough around the edges, and needs to
get GUI support for enabling the ModelSim trace input feature.</P>
<P ALIGN=CENTER><IMG SRC="images/compile.PNG" NAME="graphics7" ALIGN=BOTTOM WIDTH=669 HEIGHT=302 BORDER=0><BR><I>Compiling
ZPU application</I></P>
<P ALIGN=CENTER><IMG SRC="images/simulator.PNG" NAME="graphics9" ALIGN=BOTTOM WIDTH=722 HEIGHT=583 BORDER=0><BR><I>Setting
up the simulator</I></P>
<P ALIGN=CENTER><IMG SRC="images/simulator2.PNG" NAME="graphics11" ALIGN=BOTTOM WIDTH=722 HEIGHT=583 BORDER=0><BR><I>Choosing
ZPU executable</I></P>
<P ALIGN=CENTER STYLE="margin-bottom: 0in"><IMG SRC="images/simulator3.PNG" NAME="graphics13" ALIGN=BOTTOM WIDTH=1100 HEIGHT=720 BORDER=0><BR><I>Debug
session</I></P>
<P STYLE="margin-bottom: 0in"><BR>
</P>

<a name="fpgastarted"/>
<h1>Getting started - FPGA </h1>
The simplest version of the ZPU uses BRAM. When getting accustomed to the ZPU, a BRAM ZPU with a UART
is a good place to start.
<p>
You'll find a working simulation script in hdl/example/simzpu_small.do and hdl/example_medium/simzpu_medium.do, which
show simulation of the small(zpu_core_small.vhd) and medium sized ZPU(zpu_core.vhd). hdl/example/simzpu_interrupt.do
shows use of interrupts.
<p>
When implementing the ZPU, copy the following files and modify them to your needs:
<ol>
    <li>hdl/example/zpu_config.vhd - set up RAM size here
    <li>hdl/example/helloworld.vhd - dual port BRAM implementation. 
</ol>
Obviously you must also connect the ZPU to the rest of your IO subsystem. IO is memory mapped(read/write) in the ZPU.
<h2>Generating VHDL BRAM initialization </h2>

<code>
../install/bin/zpu-elf-objcopy -O binary hello.elf hello.bin<br>
java -classpath ../simulator/zpusim.jar com.zylin.zpu.simulator.tools.MakeRam hello.bin &gt;hello.bram<br>

</code>
<h2>Build another test application for example simulation</h2>
Here is how to build a rom image for an application using the
zpu/example simulation files.
<p>
cd zpu/roadshow/roadshow/dhrystone<br>
sh build.sh<br>
cd zpu/hdl/example<br>
gcc zpuromgen.c<br>
$ ./a<br>
Usage: ./a binary_file<br>
./a ../../roadshow/roadshow/dhrystone/dhrystone.bin >app.txt<br>
<p>
Copy and paste app.txt into helloworld.vhd.

<h2>Running example simulation</h2>
The hdl/example directory has a simulation written for Xilinx WebPack ModelSim. From the ModelSim command prompt:
<ol>
<li>cd c:/&lt;installfolder&gt;/hdl/example
<li>do zpusim_small.do
</ol>
<p>
After running the hello world simulation (see zpusim.do), two files are written to the hdl/example directory:
<ol>
<li>log.txt - contains the "Hello world!" text written to the debug channel/simplified UART.
<li>trace.txt - a trace file for the CPU. The instruction set simulator has the capability of taking
this file as input in order to verify that the HDL implementation matches the instruction set simulator. 
When a mismatch is found, the GDB debugger will break. Very handy for debugging custom ZPU implementations. 
</ol>  
<h2>HDL Directories & files </h2>
<ul>
<li>example - contains example files & working ZPU. Start here.
<li>wishbone - contains wishbone interface for the ZPU
<li>zpu3 - if you are interested in developing ZPU cores and not only using them, then this directory contains various stuff of more or less historical interest.
<li>zpu4 -  if you are interested in developing ZPU cores and not only using them, then this is the active development version. You'll also want to copy out the
files you need from this folder to your own project.
</ul>

The HDL files need a bit of spit and polish!

<a name="swstarted"/>
<h1>Getting started - software</h1>
The ZPU comes with a standard GCC toolchain and an instruction set simulator. This allows compiling, running & debugging simple test programs. The Simulator has 
some very basic peripherals defined: counter, timer interrupt and a debug output port. 
<h2>Installing</h2>
<ol>
<li>Install Cygwin. http://www.cygwin.com 
<li>Install Java
<li>Start Cygwin bash
<li>cd zpu/sw
<li>sh setup.sh
<li>/tmp/zpu/install/bin now has the .exe files for the GCC toolchain & GDB
<li>Optionally you may set up PATH variables to point to /tmp/zpu/install/bin<br>
source env.sh
</ol>
<h1>Hello world example</h1>
The ZPU toolchain comes with newlib & libstdc++ support which means that many C/C++ programs can be compiled without modification.
<p> 
<code>
cd zpu/sw/helloworld<br>
../install/bin/zpu-elf-gcc -phi hello.c -o hello.elf <br>
</code>
<h2>Running the hello world example in GDB</h2>
<ol>
<li>cd zpu/sw/helloworld
<li>Launch the simulator from a seperate bash shell:<p>
java -classpath ../simulator/zpusim.jar -Xmx512m com.zylin.zpu.simulator.Phi 4444
<p>
<img src="images/zpusim.PNG" border=0> 
<li>Launch GDB:<p>
../install/bin/zpu-elf-gdb hello.elf
<li>Connect to target, load and run application:<p>
<code>
(gdb) target remote localhost:4444<br>
(gdb) load<br>
(gdb) continue<br>
</code>
<p>
<img src="images/gccgdb.PNG">

</ol>


<a name="introduction"/>
<h1>Architecture introduction</h1>
The ZPU is a zero operand, or stack based CPU. The opcodes have a fixed width of 8 bits. 
<p>
Example:
<p>
<div style="white-space:pre;background-color:#dddddd;">
	<code style="white-space:pre;background-color:#dddddd;">
		IM 5                ; push 5 onto the stack
		LOADSP 20           ; push value at memory location SP+20
		ADD                 ; pop 2 values on the stack and push the result
	</code>
</div>
As can be seen, a lot of information is packed into the 8 bits, e.g. the IM instruction pushes a 7 bit signed integer onto the stack. 
<p>
The choice of opcodes is intimately tied to the GCC toolchain capabilities.
<p>
<div style="white-space:pre;background-color:#dddddd;">
	<code style="white-space:pre;background-color:#dddddd;">
	/* simple program showing some interesting qualities of the ZPU toolchain */
	void bar(int);
	int j;
	void foo(int a, int b, int c)
	{
	  a++;
	  b+=a;
	  j=c;
	  bar(b);
	}

foo:
 loadsp 4	; a is at memory location SP+4
 im 1
 add
 loadsp 12	; b is now at memory location SP+12
 add
 loadsp 16	; c is now at memory location SP+16
 im 24		; «j» is at absolute memory location 24. 
; Notice how the ZPU toolchain is using link-time relaxation
; to squeeze the address into a single no-op
 store
 im 22		; the fn bar is at address 22
 call
 im 12
 return	; 12 bytes of arguments + return from fn
</code>
</div>

<a name="instructionset"/>
<h1>Instruction set</h1>
Only the base instructions are implemented in the architecture. More advanced instructions, like ASHIFTLEFT are emulated in the illegal instruction vector.

All operations are 32 bit wide.
<table border="1">
	<tr><td>Name</td><td>Opcode</td><td>Description</td><td>Definition</td></tr>
	<tr>
		<td>
			BREAKPOINT
		</td>
		<td>
			00000000
		</td>
		<td>
			The debugger sets a memory location to this value to set a breakpoint. Once a JTAG-like 
			debugger interface is added, it will be convenient to be able to distinguish 
			between a breakpoint and an illegal(possibly emulated) instruction.
		</td>
		<td>
			No effect on registers
		</td>
	</tr>
	<tr>
		<td>
			IM
		</td>
		<td>
			1xxx xxxx
		</td>
		<td>
			Pushes 7 bit sign extended integer and sets the a «instruction decode interrupt mask» flag(IDIM).
			<p> 
			If the IDIM flag is already set, this instruction shifts the value on the stack left by 7 bits and stores the 7 bit immediate value into the lower 7 bits.
			<p> 
			Unless an instruction is listed as treating the IDIM flag specially, it should be assumed to clear the IDIM flag.
			<p> 
			To push a 14 bit integer onto the stack, use two consequtive IM instructions. 
			<p> 
			If multiple immediate integers are to be pushed onto the stack, they must be interleaved with another instruction, typically NOP.
		</td>
		<td>
			<code style="white-space:pre;">
pc <= pc + 1 <br>
idim <= 1 <br>
if (idim=0) then <br>
	sp <= sp - 1; <br>
	for i in wordSize-1 downto 7 loop <br>
		mem(sp)(i) <= opcode(6) <br>
	end loop <br>
	mem(sp)(6 downto 0) <= opcode(6 downto 0) <br>
else <br>
	mem(sp)(wordSize-1 downto 7) <= mem(sp)(wordSize-8 downto 0) <br>
	mem(sp)(6 downto 0) <= opcode(6 downto 0) <br>
end if
			</code>

		</td>
	</tr>
	<tr>
		<td>
			STORESP
		</td>
		<td>
			010x xxxx
		</td>
		<td>
			Pop value off stack and store it in the SP+xxxxx*4 memory location, where xxxxx is a positive integer.
		</td>
		<td>
		</td>
	</tr>
	<tr>
		<td>
			LOADSP
		</td>
		<td>
			011x xxxx
		</td>
		<td>
			Push value of memory location SP+xxxxx*4, where xxxxx is a positive integer, onto stack.
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			ADDSP
		</td>
		<td>
			0001 xxxx
		</td>
		<td>
			Add value of memory location SP+xxxx*4 to value on top of stack.
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			EMULATE
		</td>
		<td>
			001x xxxx
		</td>
		<td>
			Push PC to stack and set PC to 0x0+xxxxx*32. This is used to emulate opcodes. See 
			zpupgk.vhd for list of emulate opcode values used. zpu_core.vhd contains 
			reference implementations of these instructions rather than letting the ZPU execute the EMULATE instruction
			<p>
			One way to improve performance of the ZPU is to implement some of
			the EMULATE instructions. 
			
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			PUSHPC
		</td>
		<td>
			emulated
		</td>
		<td>
			Pushes program counter onto the stack.
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			POPPC
		</td>
		<td>
			0000 0100
		</td>
		<td>
			Pops address off stack and sets PC
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			LOAD
		</td>
		<td>
			0000 1000
		</td>
		<td>
			Pops address stored on stack and loads the value of that address onto stack.
			<p>
			Bit 0 and 1 of address are always treated as 0(i.e. ignored) by
			the HDL implementations and C code is guaranteed by the programming
			model never to use 32 bit LOAD on non-32 bit aligned addresses(i.e.
			if a program does this, then it has a bug). 
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			STORE
		</td>
		<td>
			0000 1100
		</td>
		<td>
			Pops address, then value from stack and stores the value into the memory location of the address.
			<p>
			Bit 0 and 1 of address are always treated as 0
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			PUSHSP
		</td>
		<td>
			0000 0010
		</td>
		<td>
			Pushes stack pointer. 
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			POPSP
		</td>
		<td>
			0000 1101
		</td>
		<td>
			Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads. 
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			ADD
		</td>
		<td>
			0000 0101
		</td>
		<td>
			Pops two values on stack adds them and pushes the result
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			AND
		</td>
		<td>
			0000 0110
		</td>
		<td>
			Pops two values off the stack and does a bitwise-and & pushes the result onto the stack
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			OR
		</td>
		<td>
			0000 0111
		</td>
		<td>
			Pops two integers, does a bitwise or and pushes result
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			NOT
		</td>
		<td>
			0000 1001
		</td>
		<td>
			Bitwise inverse of value on stack

		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			FLIP
		</td>
		<td>
			0000 1010
		</td>
		<td>
			Reverses the bit order of the value on the stack, i.e. abc->cba, 100->001, 110->011, etc.
			<p>
			The raison d'etre for this instruction is mainly to emulate other instructions. 
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			NOP
		</td>
		<td>
			0000 1011
		</td>
		<td>
			No operation, clears IDIM flag as side effect, i.e. used between two
			consequtive IM instructions to push two values onto the stack.
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			PUSHSPADD
		</td>
		<td>
			61
		</td>
		<td>
            a=sp; <br>
            b=popIntStack()*4;<br>
            pushIntStack(a+b);<br>
		</td>
		<td>
			
		</td>
	</tr>
	
	<tr>
		<td>
			POPPCREL
		</td>
		<td>
			57
		</td>
		<td>
			setPc(popIntStack()+getPc());
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			SUB
		</td>
		<td>
			49
		</td>
		<td>
			int a=popIntStack();<br>
                            int b=popIntStack();<br>
                            pushIntStack(b-a);<br>
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			XOR
		</td>
		<td>
			50
		</td>
		<td>
pushIntStack(popIntStack() ^ popIntStack());
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			LOADB
		</td>
		<td>
			51
		</td>
		<td>
				8 bit load instruction. Really only here for compatibility with
		C programming model. Also it has a big impact on DMIPS test.
		<p>
			pushIntStack(cpuReadByte(popIntStack())&0xff);
		</td>
		<td>
			
		</td>
	</tr>
 	<tr>
		<td>
			STOREB
		</td>
		<td>
			52
		</td>
		<td>
		8 bit store instruction. Really only here for compatibility with
		C programming model. Also it has a big impact on DMIPS test.
		<p>
			addr = popIntStack();<br>
                            val = popIntStack();<br>
                            cpuWriteByte(addr, val);
</td>
		<td>
			
		</td>
	</tr>
 	<tr>
		<td>
			LOADH
		</td>
		<td>
			34
		</td>
		<td>
		
				16 bit load instruction. Really only here for compatibility with
		C programming model.
		<p>
		
			pushIntStack(cpuReadWord(popIntStack()));
		</td>
		<td>
			
		</td>
	</tr>
 	<tr>
		<td>
			STOREH
		</td>
		<td>
			35
		</td>
		<td>
		16 bit store instruction. Really only here for compatibility with
		C programming model.
		<p>
addr = popIntStack();<br>
                            val = popIntStack();<br>
                            cpuWriteWord(addr, val);
		</td>
		<td>
			
		</td>
	</tr>
 	<tr>
		<td>
			LESSTHAN
		</td>
		<td>
			36
		</td>
		<td>
		Signed comparison<br>
                            a = popIntStack();<br>
                            b = popIntStack();<br>
                            pushIntStack((a < b) ? 1 : 0);<br>
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			LESSTHANOREQUAL
		</td>
		<td>
			37
		</td>
		<td>
		Signed comparison<br>
 a = popIntStack();<br>
                            b = popIntStack();<br>
                            pushIntStack((a <= b) ? 1 : 0);
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			ULESSTHAN
		</td>
		<td>
			37
		</td>
		<td>
		Unsigned comparison<br>
                            long a;//long is here 64 bit signed integer<br>
                            long b;<br>
                            a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff<br>
                            b = ((long) popIntStack()) & INTMASK;<br>
                            pushIntStack((a < b) ? 1 : 0);
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			ULESSTHANOREQUAL
		</td>
		<td>
			39
		</td>
		<td>
		Unsigned comparison<br>
                            long a;//long is here 64 bit signed integer<br>
                            long b;<br>
                            a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff<br>
                            b = ((long) popIntStack()) & INTMASK;<br>
                            pushIntStack((a <= b) ? 1 : 0);
		</td>
		<td>
			
		</td>
	</tr>
	<tr>
		<td>
			EQBRANCH
		</td>
		<td>
			55
		</td>
		<td>
                            int compare;<br>
                            int target;<br>
                            target = popIntStack() + pc;<br>
                            compare = popIntStack();<br>
                            if (compare == 0)<br>
                            {<br>
                                setPc(target);<br>
                            } else<br>
                            {<br>
                                setPc(pc + 1);<br>
                            }
		</td>
		<td>
			
		</td>
	</tr>
 	<tr>
		<td>
			 NEQBRANCH
		</td>
		<td>
			56
		</td>
		<td>
                            int compare;<br>
                            int target;<br>
                            target = popIntStack() + pc;<br>
                            compare = popIntStack();<br>
                            if (compare != 0)<br>
                            {<br>
                                setPc(target);<br>
                            } else<br>
                            {<br>
                                setPc(pc + 1);<br>
                            }<br>
		</td>
		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			 MULT
		</td>
		<td>
			41
		</td>
		<td>
			Signed 32 bit multiply <br>
   			pushIntStack(popIntStack() * popIntStack());
   		</td>
		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			 DIV
		</td>
		<td>
			53
		</td>
		<td>
		Signed 32 bit integer divide.<br>
                            a = popIntStack();<br>
                            b = popIntStack();<br>
                            if (b == 0)<br>
                            {<br>
                            	// undefined<br> 
                            }
                            pushIntStack(a / b);<br>
   		</td>
		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			 MOD
		</td>
		<td>
			54
		</td>
		<td>
		Signed 32 bit integer modulo.<br>
                            a = popIntStack(); <br>
                            b = popIntStack();<br>
                            if (b == 0)<br>
                            {<br>
                            	// undefined <br> 
                            }<br>
                            pushIntStack(a % b); <br>
   		</td>
		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			LSHIFTRIGHT	
		</td>
		<td>
			42
		</td>
		<td>
			unsigned shift right.<br>
	        long shift;<br>
	        long valX;<br>
	        int t;<br>
	        shift = ((long) popIntStack()) & INTMASK;<br>
	        valX = ((long) popIntStack()) & INTMASK;<br>
	        t = (int) (valX >> (shift & 0x3f));<br>
	        pushIntStack(t);<br>
   		</td>
		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			ASHIFTLEFT	
		</td>
		<td>
			43
		</td>
		<td>
			arithmetic(signed) shift left.<br>
			
			 long shift;<br>
                            long valX;<br>
                            shift = ((long) popIntStack()) & INTMASK;<br>
                            valX = ((long) popIntStack()) & INTMASK;<br>
                            int t = (int) (valX << (shift & 0x3f));<br>
                            pushIntStack(t);<br>
   		</td>
		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			ASHIFTRIGHT	
		</td>
		<td>
			43
		</td>
		<td>
		arithmetic(signed) shift left.<br>
                           long shift;<br>
                            int valX;<br>
                            shift = ((long) popIntStack()) & INTMASK;<br>
                            valX = popIntStack();<br>
                            int t = valX >> (shift & 0x3f);<br>
                            pushIntStack(t);<br>
 
   		</td>
		<td>
			
		</td>
	</tr>
    
   	<tr>
		<td>
			CALL
		</td>
		<td>
			45
		</td>
		<td>
			call procedure.<br>
			<br>
				int address = pop();<br>
                            push(pc + 1);<br>
                            setPc(address); <br>
   		</td>
		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			 CALLPCREL
		</td>
		<td>
			63
		</td>
		<td>
			call procedure pc relative<br>
			<br>
int address = pop();<br>
                            push(pc + 1);<br>
                            setPc(address+pc);   		</td>
		<td>
			
		</td>
	</tr>
    
    
  	<tr>
		<td>
			 EQ
		</td>
		<td>
			46
		</td>
		<td>
 pushIntStack((popIntStack() == popIntStack()) ? 1 : 0);		<td>
			
		</td>
	</tr>
  	<tr>
		<td>
			 NEQ
		</td>
		<td>
			48
		</td>
		<td>
 pushIntStack((popIntStack() != popIntStack()) ? 1 : 0);		<td>
			
		</td>
	</tr>
 	<tr>
		<td>
			 NEG
		</td>
		<td>
			47
		</td>
		<td>
 pushIntStack(-popIntStack());<td>
			
		</td>
	</tr>
    
	
</table>
<a name="startup"/>
<h1>Custom startup code (aka crt0.s)</h1>
To minimize the size of an application, one important trick is to
strip down the startup code. The startup code contains emulation
of instructions that may never be used by a particular application.
<p>
The startup code is found in the GCC source code under gcc/libgloss/zpu,
but to make the startup code more available, it has been duplicated
into <a href="../sw/startup">zpu/sw/startup</a> 
<p>
To minimize startup size, see <a href="../roadshow/roadshow/codesize/index.html">codesize</a>
demo. This is pretty standard GCC stuff and simple enough once you've
been over it a couple of times.

<a name="implementing"/>
<h1>Implementing your own ZPU</h1>
One of the neat things about the ZPU is that the instruction set and architecture
is very small and it is easy to implement a ZPU from scratch or modify the
existing ZPU implementations.
<p>
Implementing a ZPU can be done without understanding the toolchain in
detail, i.e. using exclusively HDL skills and only a rudimentary
understanding of standard GCC/GDB usage is sufficient. 
<p>
A few tips:
<ul>
<li>Run zpu_core.vhd or zpu_core_small.vhd and generate an instruction trace
from ModelSim or similar. To check that you own implementation is correctly
implemented, verify that the instruction trace for the new and old
ZPU implementations match. This gives you a simple way to do regression
tests as you develop your ZPU.
<li>To improve performance, you can add more instructions. The EMULATE instructions
are optional in HDL since they will be emulated in software if they are not
implemented in HDL. This allows you to run the ZPU executables unmodified
regardless of which EMULATE instructions you implement.
<li>Run the DMIPS test to measure your overall performance 
<li>Run the histogram.perl script on the instruction trace to generate
histograms of the instructions. Profiling is essential to making
the right choices w.r.t. optimisation for your application. 
</ul>


<a name="vectors"/>
<h1>Vectors</h1>
<table border="1">
	<tr><td>Address</td><td>Name</td><td>Description</td></tr>
	<tr>
		<td>0x000</td>
		<td>Reset</td>
		<td>
			1.When the ZPU boots, this is the first instruction to be executed.
			<p>
			2.The stack pointer is initialised to maximum RAM address
			</td>
	</tr>
	<tr>
		<td>0x020</td>
		<td>Interrupt</td>
		<td>
			This is the entry point for interrupts.
		</td>
	</tr>
	<tr>
		<td>0x040-</td>
		<td>Emulated instructions</td>
		<td>
			Emulated opcode 34. Note that opcode 32 and opcode 33 are not normally used to emulate instructions as these memory addresses are already used by boot vector, GCC registers and the interrupt vector.
		</td>
	</tr>
</table>

<a name="memorymap"/>
<h1>Phi memory map</h1>
The ZPU architecture does not define a memory map as such, but the GCC + libgloss + ecos hal library uses the
memory map below. "Phi" is just a three letter word for the particular memory layout below that came about
while developing the ZPU.
<p>
	<TABLE WIDTH=604 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7 CELLSPACING=0 STYLE="page-break-after: avoid">
		<COL WIDTH=85>
		<COL WIDTH=42>
		<COL WIDTH=136>
		<COL WIDTH=283>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Address</B></FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Type</B></FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Name</B></FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Description</B></FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0000</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
				enable</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:1] Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]	Enable ZPU operations</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	ZPU
				is held in Idle mode</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	ZPU
				running</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A000C</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
				Debug channel / UART to ARM7 TX</FONT></FONT></P>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
				ZPU side</B></FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:9] Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[8]	TX buffer ready (valid on ready)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	TX
				buffer not ready (full)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	TX
				buffer ready</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[7:0]	TX byte (valid on write)</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0010</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
				Debug channel / UART to ARM7 RX</FONT></FONT></P>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
				ZPU side</B></FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:9] Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[8]	RX buffer data valid</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	RX
				buffer not valid</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	RX
				buffer valid</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[7:0]	RX byte (when valid)</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0014</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Counter(1)</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]	Reset counter (valid for write)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	N/A</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Reset
				counter</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[1]	Sample counter (valid for write)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	N/A</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Sample
				counter</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:0]		Counter bit 31:0</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0018</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Counter(2)</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:0]		Counter bit 63:32</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0020</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read
				/ Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Global_Interrupt_mask</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:1]		Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]		Global intr. Mask</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	Interrupts
				enabled</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Interrupts
				disabled</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0024</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">UART_INTERRUPT_ENABLE</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:1]		Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]		Debug channel / UART RX interrupt enable</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	Interrupt
				disable</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Interrupt
				enable</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0028</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">UART_interrupt</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:1]		Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]		Debug channel / UART RX interrupt pending (Read)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	No
				interrupt pending</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Interrupt
				pending</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]		Clear UART interrupt (Write)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	N/A</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Interrupt
				cleared</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A002C</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Interrupt_enable</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:1]		Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]		Timer interrupt  enable</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	Interrupt
				disable</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Interrupt
				enable</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0030</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read
				/</FONT></FONT></P>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_interrupt</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:2]		Not used</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]		Timer interrupt pending (Read)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	No
				interrupt pending</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Interrupt
				pending</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[1]		Reset Timer counter (Write)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	N/A</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Timer
				counter reset</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[0]		Clear Timer interrupt (Write)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	0	N/A</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">	1	Interrupt
				cleared</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0034</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Period</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:0]		Interrupt period (write)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">		Number
				of clock cycles</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">		between
				timer interrupts</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
				</B>The timer will start at Timer_Periode value and count <B>down</B>
				to zero, and generate an interrupt</FONT></FONT></P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">.0x080A0038</FONT></FONT></P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Counter</FONT></FONT></P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
				[31:0]		Timer counter (read)</FONT></FONT></P>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
		</TR>
		<TR VALIGN=TOP>
			<TD WIDTH=85>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
			<TD WIDTH=42>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=136>
				<P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
				</P>
			</TD>
			<TD WIDTH=283>
				<P LANG="en-US" CLASS="western"><BR>
				</P>
			</TD>
		</TR>
	</TABLE>
<a name="wishbone"/>
<h1>Wishbone</h1>
In <a href="../hdl/wishbone" target="_blank">hdl/wishbone</a> there is an implementation
of a wishbone bridge.
<p>
However this wishbone bridge was used together with the <a href="../hdl/zy2000" target="_blank">hdl/zy2000</a> implementation
of the ZPU, which differs slightly from <a href="../hdl/zpu4/core" target="_blank">hdl/zpu4/core</a>.
<p>
The ZY2000 is a complete implementation of the ZPU including: DRAM, soft-MAC, wishbone bridges, GPIO subsystem,
etc. This also included an eCos HAL w/TCP/IP support.

<a name="hwdebugger"/>
<h1>JTAG/hardware debugger for GDB</h1>
The Zylin <a href="http://www.zylin.com/zy1000.html">ZY1000</a> JTAG debugger supports
the ZPU. Contact <a href="http://www.zylin.com">Zylin</a> for pricing and details.
<p>
There are two debug modes in which the ZY1000 can operate:
<ul>
<li>Classic. Here the ZY1000 controls the CPU and examines the state. The ZY1000 has a built in
GDB server that GDB talks to.
<li>Small footprint. If there isn't enough space on the device for the ZPU *and* the JTAG
controller, then the ZY1000 can run the ZPU externally. The JTAG communication channel is
then used to peek/poke peripherals and inside the FPGA instead of the ZPU there is then
a JTAG controller that peeks and pokes the peripherals of the ZPU. There are advantages
and disadvantages of this approach: it may be unfamiliar to embedded developers and  
the timing is different from the "real" ZPU(interrupts are delayed, execution speed
differse, etc.) On the other hand there are other things
which are simpler: much more RAM can be available for the ZPU during development,
better debug consoles(faster), additional peripheral(timers, etc.) is available. This
approach is somewhat unique to the ZPU as the ZPU is simple enough that it can be
implemented efficiently in this manner. 
</ul>

<a name="interrupts"/>
<h1>Interrupts</h1>
The ZPU supports interrupts.
<p>
To trigger an interrupt, the interrupt signal must be asserted. The ZPU does
not define any interrupt disabling mechanism, this must be implemented by the
interrupt controller and controlled via memory mapped IO.
<p>
Interrupts are masked when the IDIM flag is set, i.e.
with consequtive IM instructions.
<p>
The ZPU has an edge triggered interrupt. As the ZPU notices that the interrupt
is asserted, it will execute the interrupt instruction. The interrupt signal
must stay asserted until the ZPU acknowledges it. 
<p>
When the interrupt instruction is executed, the PC will be pushed onto the
stack and the PC will be set to the interrupt vector address (0x20).
<p>
Note that the GCC compiler requires three registers r0,r1,r2,r3 for some
rather uncommon operations. These 32 registers are mapped to memory locations 0x0,
0x4, 0x8, 0xc.  The default interrupt vector at address 0x20 will load the
value of these memory locations onto the stack, call _zpu_interrupt and
restore them.
<p>
See zpu/hdl/zpu4/test/interrupt/ for C code and zpu/hdl/example/simzpu_interrupt.do
for simulation example.
<a name="zpu_core_small.vhd"/>
<h1>About zpu_core_small.vhd</h1>
The small ZPU implements the minimum instruction set. It is optimized for size and simplicity
serving as a reference in both regards.
<p>
It uses a BRAM (dual port RAM w/read/write to both ports) as data & code storage and
is implemented as a simple state machine. 
<p>
Essentially it has three states:
<ol>
<li>Fetch - starts fetch of next instruction
<li>FetchNext - sets up operands for execute cycle
<li>Decode - decodes instruction
<li>Execute - well.. executes instruction
</ol>
The tricky bit is that there is a tiny bit of interleaving of
states since the BRAM takes a cycle to perform a fetch/store. The above is the
normal states the ZPU cycles through unless memory fetch, jumps, etc. take
place.
<a name="performance"/>
<h1>Speeding up the ZPU</h1>
There are two aspects of speeding up the ZPU: making it perform better
for a particular application and toying around with the ZPU architecture.
<h2>Performance tips</h2>
<ol>
<li>Profile. Create a small sample and run in a simulator that is as close
to the real deployment as possible. zpu4/core/histogram.perl is a script
that will tell you which instructions take the most time.
<li> Using the profile output, decide on which emulated instructions that
it makes sense to implement in HDL for your particular application. Modifying
zpu_core_small.vhd is not particularly hard. Most instructions can be
transliterated into zpu_core_small.vhd from zpu_core.vhd without too much
problem.
<li>The memory subsystem may well turn out to be where you should concentrate
your efforts.
</ol>
<h2>Toying around with the architecture</h2>
Again: profile 90% of the time and spend the remaining 10% tinkering 
with the architecture.
<ul>
<li>There is a DMIPS program you can use to measure the performance of
the ZPU in lieu of profiling a real application. The latter is obviously
a superior solution.
<li>Again: use histogram.perl to figure out which instructions you should add
in HDL.
<li>Tinker a bit with Fmax to find the maximum speed rating for your design.
<li>zpu_core_small.vhd should be ca. 1 DMIPS and zpu_core.vhd should yield
about 5-10 DMIPS before adding instructions runs out of steam.
</ul>
If you need to get ca. 20-50 DMIPS out of the ZPU you will have to
write a heavily pipelined architecture with caches(if you are running
against DRAM). This is *tricky*, but some proof of concept work was
done to show 20 DMIPS w/the ZPU(the actual result was discarded since
it was not complete and contained fatal flaws).
<p>
Achieving above 50-100 DMIPS with the current ZPU architecture is probably
a non-starter and a more conventional RISC design makes more sense here.
<p>
The unique advantages of the ZPU is size in terms of HDL & code size.
<a name="debuguart"/>
<h1>Debug channel / UART</h1>
All self respecting embedded projects should have a debug channel
to print stuff to. Typically this is a standard RS232 or UART, but
it can also be something more exotic like a DCC JTAG channel.
<p>
The point is that characters(bytes) are sent to/from the ZPU
via some terminal.
<p>
The ZPU defines in the memory map a UART / debug channel. This
should be implemented by some suitable debug channel for
the device in which the ZPU is implemented.
<p>
www.opencores.org has several UART implementations. This is one
of the simpler ones:

<a href="http://www.opencores.org/projects.cgi/web/uart/overview">
http://www.opencores.org/projects.cgi/web/uart/overview</a>
<h2>Implementing your own UART / debug channel</h2>
The first thing you need to do is to choose a debug channel for your
hardware. This could be a UART, but it doesn't have to be.
<p>
Secondly you should write a small HDL module that interface between
the ZPU memory map of debug channel to the UART. This should
    be relatively simple as all you need to do is to let the ZPU
    query the FIFO in/out for busy flag and allow the ZPU to read/write
    data to the UART via the memory map.
<a name="zpu_core.vhd"/>
<h1>About zpu_core.vhd</h1>
The zpu_core.vhd has a single port memory interface. All data, code and IO is
accessed through this memory interface.
<p>
It performs better(despite having less memory bandwidth than zpu_core_small.vhd)
since it implements many more instructions.
<h1>Compiling hello world program with the ZPU GCC toolchain</h1>
The ZPU comes with a standard GCC toolchain and an instruction set simulator. This allows compiling, running & debugging simple test programs. The Simulator has 
some very basic peripherals defined: counter, timer interrupt and a debug output port. 
<h1>Installation</h1>
<ol>
<li>Install Cygwin. http://www.cygwin.com 
<li>Start Cygwin bash
<li>unzip zputoolchain.zip
<li>Add install/bin from zputoolchain.zip to PATH.<br>
export PATH=$PATH:<unzipdir>/install/bin
</ol>
<h1>Hello world example</h1>
The ZPU toolchain comes with newlib & libstdc++ support which means that many C/C++ programs can be compiled without modification.
<p> 
<code>
zpu-elf-gcc -Os -zeta hello.c -o hello.elf -Wl,--relax -Wl,--gc-sections<br>
zpu-elf-size hello.elf<br>
</code>

<!-- SPI controller -->
<a name="spicontroller">

<h1>SPI flash controller (read-only)</h1>
This is a simple read-only SPI flash controller, with the following characteristics:

<dl>
 <li>Fast-READ only implementation.</h1>
 <li>32-bit only access</h1>
 <li>Fast sequential read access - Uses low-clock approach</li>
</dl>

<h2>Version</h2>
The current version is 1.2. This is also the first public version available.

<h2>Timing overview</h2>

<p>Simple timing overview, with one nonsequential access to address 0x0, followed by a sequential access to address 0x4. 
This simulation was done with Xilinx tools, after post-routing, and using a ZPU to access the SPI</p>
<div>
<img src="images/spi_timing_overview.png">
</a>
<p>Image 1: Timing overview</p>
</div>

On Image 2, you can see the clock almost perfectly centered on data, when we write to the SPI flash.

<div>
<img src="images/spi_readfast_timing.png">
<p>Image 2: Issuing commands to the SPI</p>
</div>

As you can see from Image 3, I assume the worst-case read delay from SPI (which is 15ns, as you can see from the marker).

<div>
<img src="images/spi_read_timing.png">
<p>Image 3: Reading from the SPI</p>
</div>

<h2>Usage</h2>

Simple description of SPI controller interface:

<table border="1">
<tr>
  <th>Symbol</th>
  <th>Direction</th>
  <th>Bit width</th>
  <th>Purpose</th>
</tr>
<tr><td>adr</td><td>Input</td><td>24</td><td>Address where to read from SPI</td></tr>
<tr><td>dat_o</td><td>Output</td><td>32</td><td>Data read from SPI</td></tr>
<tr><td>clk</td><td>Input</td><td>1</td><td>Input clock. Used for both interface and SPI</td></tr>
<tr><td>ce</td><td>Input</td><td>1</td><td>Chip Enable</td></tr>
<tr><td>rst</td><td>Input</td><td>1</td><td>Asynchronous reset</td></tr>
<tr><td>ack</td><td>Output</td><td>1</td><td>Data valid ACK</td></tr>
<tr><td>SPI_CLK</td><td>Output</td><td>1</td><td>SPI output clock</td></tr>
<tr><td>SPI_MOSI</td><td>Output</td><td>1</td><td>SPI output data from controller to chip</td></tr>
<tr><td>SPI_MISO</td><td>Input</td><td>1</td><td>SPI input data from chip to controller</td></tr>
<tr><td>SPI_SELN</td><td>Output</td><td>1</td><td>SPI nSEL (deselect, active low) signal</td></tr>
</table>



<h2>License</h2>
The Verilog implementation is released under BSD license. See the file itself for more licensing details.

<h2>Dowload</h2>
Download the Verilog code here: <a href="/files/electronics/spi/spi_controller.v">spi_controller.v</a>

<h2>Troubleshooting</h2>
The current implementation is timed and optimized for myself. Your parameters might not be the same
as those I defaulted, so read the code carefully. If you have any issue let me know.




<!-- Zealot -->
<a name="zealot"/>
<h1>Zealot: Implementing in FPGAs</h1>

The Zealot version of ZPU is a ZPU medium variant ready to be used with FPGAs.
It was tested using Xilinx Spartan 3 1500 FPGAs and was contributed by
Salvador E. Tropea. The key features are:<p>

<ul>
<li>Includes a very basic <a href="#memorymap">PHI I/O</a> synthetizable core.
It implements the 64 bits clocks counter (timer) and the UART. This is enough
to run the DMIPS benchmark and a hello world application. I tested the UART
@ 9600 bps and @ 115200 bps.</li>
<li>The ZPU can be customized using generics. It allows the use of more
than one core in the same project without problems.</li>
<li>Implements the lshiftright instruction in hardware, this gives around
10% boost in the DMIPS benchmark (Medium version).</li>
<li>You can disable various instructions groups and let them to the
emulation soft, so you can experiment with various LUTs vs DMIPS
configurations (Medium version).</li>
<li>The medium version provides aprox. 2.6 DMIPS @ 50 MHz and the small
0.5 DMIPS @ 50 MHz.</li>
<li>Enhanced trace module, it includes the assembler for the executed
instruction and can also meassure how much stack was consumed during the
execution.</li>
<li>Includes ready to use memory images for a hello world program and the
DMIPS benchmark.</li>
<li>Memory and trace blocks outside ZPU. This provides better modularity.</li>
</ul>

Simulation and implementation files are provided. You need 16 kB of BRAMs
for the "hello world" example and 32 kB for the DMIPS benchmark. The medium
version takes around 1030 slices and 3 multipliers and the small version
around 430 slices.<p>

The generics for the Zealot Medium ZPU are:<p>

<ul>
<li><b>WORD_SIZE</b> (integer:=32) Data width, only 32 bits are really
tested/supported. Adding support for 16 bits should be simple, but the
toolchain needs to support it.</li>
<li><b>ADDR_W</b> (integer:=16) Address bus width memory+I/O space. The MSB
selects the address space (1=I/O).</li>
<li><b>MEM_W</b> (integer:=15) Memory address bus width. It includes program,
data and stack sections.</li>
<li><b>D_CARE_VAL</b> (std_logic:='X') Value used to fill the unsused bits.
For simulations this should be '0', for synthesis this is a value that your
tools interprets as "don't care". Xilinx tools could get benefit from using
'X'. This is particularly true to assign default values and for unreached
cases. Note that I didn't find it useful.</li>
<li><b>MULT_PIPE</b> (boolean:=false) Enables the multiplication pipeline.
This can allow faster clocks but will make the mult instruction slower (more
clocks consumed).</li>
<li><b>BINOP_PIPE</b> (integer range 0 to 2:=0) Enables the pipeline for
the -, =, &lt; and &lt;= operations. This can allow faster clocks but will
make these instruction slower (more clocks consumed). This value is the
ammount of extra clocks added.</li>
<li><b>ENA_LEVEL0</b> (boolean:=true) Enables the hardware implementation of
eq, neqbranch, loadb and pushspadd instructions.</li>
<li><b>ENA_LEVEL1</b> (boolean:=true) Enables the hardware implementation of
lessthan, ulessthan, mult, storeb, callpcrel and sub instructions.</li>
<li><b>ENA_LEVEL2</b> (boolean:=false) Enables the hardware implementation of
lessthanorequal, ulessthanorequal, call and poppcrel instructions.</li>
<li><b>ENA_LSHR</b> (boolean:=true) Enables the hardware implementation of
lshiftright instruction.</li>
<li><b>ENA_IDLE</b> (boolean:=false) Enables the enable_i usage. This signal
can hold the CPU in an idle state if after reset this signal remains active.
When disabled the enable_i signal isn't used and the idle state is removed.</li>
<li><b>FAST_FETCH</b> (boolean:=true) This version of the ZPU fetches 4
instructions at ones (32 bits), then they are decoded (2 cycles) and finally
executed. The decoded instructions are stored in a "decode cache", the first
instruction is immediatly moved to the "current instruction" register and a
"special instruction" replaces the first slot. This "special instruction"
makes the CPU go to the fetch state. When you enable this generic the FSM
does the fetch instead of wating one clock cycle to go to the fetch state.
This makes instructions run a little bit faster, but it can cost area and/or
frequency.</li>
</ul>

For more information read the 0README.txt file located inside the zealot
directory.<p>
<!-- End of Zealot -->

<a name="codesize"/>
<h1>Optimizing for code size</h1>
The ZPU toolchain produces highly compact code.  
<ol>
<li>Since the ZPU GCC toolchain supports standard ANSI C, it is easy to stumble across
functionality that takes up a lot of space. E.g. the standard printf() function is a beast. Some compilers drop e.g. floating point support
from the printf() function and thus boast a "smaller" printf() when in fact they have a non-standard printf(). newlib has a standard printf() function
and an alternative iprintf() function that works only on integers.
<li>The ZPU ships with default startup code that works across various configurations of the ZPU, so be warned that there is some overhead that will
not occurr in the final application(anywhere between 1-4kBytes).
<li>Compilation and linker options matter. The ZPU benefits greatly from the "-Wl,--relax -Wl,--gc-sections" options which is not used by
all architectures(e.g. GCC ARM does not implement/need -Wl,--relax).  
</ol> 
<h2>Small code example</h2>
<code>
zpu-elf-gcc -Os -abel smallstd.c -o smallstd.elf -Wl,--relax -Wl,--gc-sections<br>
zpu-elf-size small.elf<br>
<br>
$ zpu-elf-size small.elf<br>
   text    data     bss     dec     hex filename<br>
   2845     952      36    3833     ef9 small.elf<br>
<br>
</code>

<h2>Even smaller code example</h2>
If the ZPU implements the optional instructions, the RAM overhead can be reduced significantly.
<p>
<code>
zpu-elf-gcc -Os -abel crt0_phi.S small.c -o small.elf -Wl,--relax -Wl,--gc-sections -nostdlib <br>
zpu-elf-size small.elf<br>
<br>
$ zpu-elf-size small.elf<br>
   text    data     bss     dec     hex filename<br>
     56       8       0      64      40 small.elf<br>
     <br>
</code>



<a name="ecos"/>
<h1>Installing eCos build tools</h1>
<code>
tar -xjvf ecossnapshot.tar.bz2<br>
tar -xjvf repository.tar.bz2<br>
tar -xjvf ecostools.tar.bz2<br>
# run this every time you open the shell<br>
export PATH=$PATH:`pwd`/ecos-install<br>
export ECOS_REPOSITORY=`pwd`/ecos/packages:`pwd`/repository<br>
</code>
<h1>Compiling eCos tests</h1>
<code>
ecosconfig new zeta default<br>
ecosconfig tree<br>
make<br>
cd kernel/current<br>
make tests<br>
</code>

<h1>Code size ZPU</h1>
<code>
$ zpu-elf-size *<br>
   text    data     bss     dec     hex filename<br>
  15761    1504   12060   29325    728d bin_sem0<br>
  16907    1512   14436   32855    8057 bin_sem1<br>
  17105    1524   30032   48661    be15 bin_sem2<br>
  17186    1512   14436   33134    816e bin_sem3<br>
  18986    1500   12036   32522    7f0a clock0<br>
  15812    1504   13236   30552    7758 clock1<br>
  25095    1972   13224   40291    9d63 clockcnv<br>
  16437    1500   13224   31161    79b9 clocktruth<br>
  15762    1504   12060   29326    728e cnt_sem0<br>
  17124    1512   14436   33072    8130 cnt_sem1<br>
  35947    1564   22512   60023    ea77 dhrystone<br>
  16428    1500   13228   31156    79b4 except1<br>
  15751    1504   12052   29307    727b flag0<br>
  19145    1512   15624   36281    8db9 flag1<br>
  20053    1516  102908  124477   1e63d fptest<br>
  15998    1496   12092   29586    7392 intr0<br>
  16080    1496   12200   29776    7450 kalarm0<br>
  15327    1496   12036   28859    70bb kcache1<br>
  15549    1496   13224   30269    763d kcache2<br>
  18291    1500   12260   32051    7d33 kclock0<br>
  16231    1500   13232   30963    78f3 kclock1<br>
  16572    1496   13228   31296    7a40 kexcept1<br>
  15618    1496   12060   29174    71f6 kflag0<br>
  19287    1500   15624   36411    8e3b kflag1<br>
  16887    1516   15628   34031    84ef kill<br>
  16186    1496   12128   29810    7472 kintr0<br>
  19724    1504   14516   35744    8ba0 klock<br>
  18283    1500   14592   34375    8647 kmbox1<br>
  15539    1496   12064   29099    71ab kmutex0<br>
  16524    1504   15664   33692    839c kmutex1<br>
  18272    1712   20348   40332    9d8c kmutex3<br>
  18682    1608   20352   40642    9ec2 kmutex4<br>
  15619    1496   14412   31527    7b27 ksched1<br>
  15567    1496   12060   29123    71c3 ksem0<br>
  17063    1500   14436   32999    80e7 ksem1<br>
  15504    1496   13228   30228    7614 kthread0<br>
  16167    1496   14412   32075    7d4b kthread1<br>
  18281    1512   14580   34373    8645 mbox1<br>
  20611    1508   14940   37059    90c3 mqueue1<br>
  15672    1504   12064   29240    7238 mutex0<br>
  16678    1516   15664   33858    8442 mutex1<br>
  17694    1508   16868   36070    8ce6 mutex2<br>
  18203    1720   20344   40267    9d4b mutex3<br>
  16352    1508   14428   32288    7e20 release<br>
  15890    1500   14412   31802    7c3a sched1<br>
  44196    1612  286332  332140   5116c stress_threads<br>
  17891    1524   16864   36279    8db7 sync2<br>
  16943    1512   15644   34099    8533 sync3<br>
  15467    1496   13064   30027    754b thread0<br>
  16134    1496   14420   32050    7d32 thread1<br>
  17560    1512   15636   34708    8794 thread2<br>
  16279    1500   24028   41807    a34f thread_gdb<br>
  17051    1504   20376   38931    9813 timeslice<br>
  17146    1504   21564   40214    9d16 timeslice2<br>
  37313    1512  422380  461205   70995 tm_basic<br>
</code>
<h2>Code size ARM (non-thumb)</h2>
Thumb does not compile out of the box w/AT91 EB40a for which this test was made.<p>
<code>
$ arm-elf-size *<br>
   text    data     bss     dec     hex filename<br>
  25204     692   16976   42872    a778 bin_sem0<br>
  26644     700   22096   49440    c120 bin_sem1<br>
  26996     712   55584   83292   1455c bin_sem2<br>
  27008     700   22100   49808    c290 bin_sem3<br>
  28992     688   16944   46624    b620 clock0<br>
  25456     692   19532   45680    b270 clock1<br>
  34572    1160   19520   55252    d7d4 clockcnv<br>
  26224     688   19508   46420    b554 clocktruth<br>
  25204     692   16976   42872    a778 cnt_sem0<br>
  26888     700   22108   49696    c220 cnt_sem1<br>
  44180     752   27416   72348   11a9c dhrystone<br>
  26088     688   19520   46296    b4d8 except1<br>
  25236     692   16968   42896    a790 flag0<br>
  29532     700   24668   54900    d674 flag1<br>
  29508     704  109652  139864   22258 fptest<br>
  25932     684   17016   43632    aa70 intr0<br>
  25824     684   17112   43620    aa64 kalarm0<br>
  24728     684   16956   42368    a580 kcache1<br>
  25168     684   19512   45364    b134 kcache2<br>
  28112     688   17168   45968    b390 kclock0<br>
  25976     688   19524   46188    b46c kclock1<br>
  26372     684   19512   46568    b5e8 kexcept1<br>
  25140     684   16968   42792    a728 kflag0<br>
  29824     688   24660   55172    d784 kflag1<br>
  26896     704   24656   52256    cc20 kill<br>
  26088     684   17028   43800    ab18 kintr0<br>
  30812     692   22176   53680    d1b0 klock<br>
  28504     688   22260   51452    c8fc kmbox1<br>
  24984     684   16984   42652    a69c kmutex0<br>
  26504     692   24704   51900    cabc kmutex1<br>
  28792     900   34892   64584    fc48 kmutex3<br>
  29264     796   34896   64956    fdbc kmutex4<br>
  25240     684   22084   48008    bb88 ksched1<br>
  25044     684   16968   42696    a6c8 ksem0<br>
  26988     688   22100   49776    c270 ksem1<br>
  25028     684   19512   45224    b0a8 kthread0<br>
  25996     684   22080   48760    be78 kthread1<br>
  28552     700   22252   51504    c930 mbox1<br>
  31324     696   22612   54632    d568 mqueue1<br>
  25108     692   16980   42780    a71c mutex0<br>
  26464     704   24700   51868    ca9c mutex1<br>
  27624     696   27280   55600    d930 mutex2<br>
  28596     908   34884   64388    fb84 mutex3<br>
  26156     696   22100   48952    bf38 release<br>
  25460     688   22084   48232    bc68 sched1<br>
  56356     828   45892  103076   192a4 stress_threads<br>
  27900     712   27288   55900    da5c sync2<br>
  26760     700   24692   52152    cbb8 sync3<br>
  24924     684   19356   44964    afa4 thread0<br>
  25868     684   22084   48636    bdfc thread1<br>
  27452     700   24680   52832    ce60 thread2<br>
  26136     688   42704   69528   10f98 thread_gdb<br>
  27212     692   34916   62820    f564 timeslice<br>
  52728     700  123332  176760   2b278 tm_basic<br>
</code>



<a name="nextgen"/>
<h1>Next generation ZPU</h1>
Based on feedback here is a list of a tenuous "consensus" for the next generation
of the ZPU with some tentative ideas on implementation.
<p>
The plan is to update zpu_core.vhd and zpu_core_small.vhd as examples/reference,
and to open up for innovation in the HDL implementation.

<ol>
<li>Reduce minimum code size footprint
<ol>
<li>Add single entry for unknown instructions. PC and unsupported instruction is
pushed onto stack before jumping to unkonwn instruction vector. This makes it possible
to write denser microcode for missing instructions. For emulated opcodes that are 
not in use, the microcode can more easily be disabled. Determining
that e.g. MULT is not used, can be a bit tricky, but disabling it is easy.
<p>
The address of this entry will be 0x10. The reason 0x00 is not used is that 
GCC needs 0x00-0x0b inclusive to store R0-R2(memory mapped GCC registers). 
The reset vector remains 0x0 so the 0x00-0x0f addresses contains the
first few instructions executed by the ZPU. Some very early work has been
done in <a href="../sw/startup/nextgen_crt0.S"> nextgen_crt0.S</a>.
<li>Single entry for *all* unknown instructions does not limit emulation to the
EMULATE instructions today, but instructions such as OR, LOADSP, STORESP, ADDSP,
etc. can also be emulated. This opens up for further reduction in logic usage.
<li>The single entry for all unknown instructions will make it easier to
write a compact custom crt0.s to fit an instruction subset. 
<li>The interrupt is basically an unknown instruction that is injected into
the execution stream.
<li>Possibly modify the java simulator to support the single entry for unknown
instructions.
</ol>
<li>Add floating point add and mult. FADD & FMULT. Option to generate the instructions
from the compiler.
<li>Add GCC support for seperate code/data bus. This may be as "simple" as
writing a custom linker script for the current GCC compiler.
<li>Add some scheme to support custom instructions. Can this be combined with
single entry point for unknown instructions?
<li>Add support to Zylin Embedded CDT for downloading fully functional ZPU
toolchain. The goal is to allow new users to write and simulate simple ZPU
programs in in less than an hour.
<li>Strip away unused instructions from GCC and add options to GCC for not
emitting more advanced instructions. This will e.g. convert MULT/DIV into
function calls to libgcc and thus make it easier to determine that
microcode is not needed.
</ol>
<h2>Next generation ZPU HDL work</h2>
<ol>
<li>Incorporate feedback on FPGA tricks to reduce memory usage: do not
use asynchronous reset?, use BRAMs in synchronous mode to reduce 
complexity of state machine?, seperate code/data bus? Reduce
instruction set further. Goal: <300 LUT's for 32 bit ZPU
<li>Will someone be willing to contribute a heavily pipelined ZPU?
For this to make sense, the performance must hit 20 DMIPS w/DRAM & cache.
This ZPU could run a TCP/IP stack with relevant performance to compete
with stripped down ARM7 type systems.
</ol>

<a name="download"/>
<h1>Download source code</h1> 
</P>
<P>The simplest way to get the ZPU HDL source and tools is to check
it out from CVS:</P>
<P>cvs -d :pserver:anonymous@cvs.opencores.org:/cvsroot/anonymous co
zpu/zpu</P>
<P>Start by reading zpu/zpu/hdl/index.html</P>

<a name="patch"/>
<h1>Creating a patch</h1> 
<P><BR>Please submit changes to the <a href="#mailinglist">zylin-zpu mailing list</a> as a patch.
</P>
<ol>
<li>Merge your changes with CVS HEAD.  
<li>Update the FreeBSD or GPL copyright with your name in the case
of non-trivial changes. If in doubt, add the copyright.
<li>Add an entry to zpu/ChangeLog with date, your name, email, the
files you changed and a comment. 
<li><code>cd zpu <BR>cvs diff -upN . &gt; mypatch.txt</code>
<li>Email it to <a href="#mailinglist">zylin-zpu mailing list</a>. Attach it
as an uncompressed .txt file
</ol>
<a name="mailinglist"/>
<h1>Getting help - mailing list</h1> 
The place to get help is the <a href="http://www.zylin.com/mailinglist.html">zylin-zpu mailing list</a>
</body>
<html>
OpenPOWER on IntegriCloud