From 7e91fb42c0e203a15024f9d5014e2df516fbb037 Mon Sep 17 00:00:00 2001 From: oharboe Date: Thu, 21 Aug 2008 21:22:59 +0000 Subject: added some notes on speeding up the ZPU --- zpu/docs/zpu_arch.html | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) (limited to 'zpu') diff --git a/zpu/docs/zpu_arch.html b/zpu/docs/zpu_arch.html index 84ccc1a..c7e20bc 100644 --- a/zpu/docs/zpu_arch.html +++ b/zpu/docs/zpu_arch.html @@ -11,6 +11,7 @@
  • Jump vectors
  • Memory map
  • Interrupts +
  • Speeding up the ZPU
  • Wishbone
  • About zpu_core_small.vhd
  • About zpu_core.vhd @@ -1409,6 +1410,46 @@ The tricky bit is that there is a tiny bit of interleaving of states since the BRAM takes a cycle to perform a fetch/store. The above is the normal states the ZPU cycles through unless memory fetch, jumps, etc. take place. + +

    Speeding up the ZPU

    +There are two aspects of speeding up the ZPU: making it perform better +for a particular application and toying around with the ZPU architecture. +

    Performance tips

    +
      +
    1. Profile. Create a small sample and run in a simulator that is as close +to the real deployment as possible. zpu4/core/histogram.perl is a script +that will tell you which instructions take the most time. +
    2. Using the profile output, decide on which emulated instructions that +it makes sense to implement in HDL for your particular application. Modifying +zpu_core_small.vhd is not particularly hard. Most instructions can be +transliterated into zpu_core_small.vhd from zpu_core.vhd without too much +problem. +
    3. The memory subsystem may well turn out to be where you should concentrate +your efforts. +
    +

    Toying around with the architecture

    +Again: profile 90% of the time and spend the remaining 10% tinkering +with the architecture. +
      +
    • There is a DMIPS program you can use to measure the performance of +the ZPU in lieu of profiling a real application. The latter is obviously +a superior solution. +
    • Again: use histogram.perl to figure out which instructions you should add +in HDL. +
    • Tinker a bit with Fmax to find the maximum speed rating for your design. +
    • zpu_core_small.vhd should be ca. 1 DMIPS and zpu_core.vhd should yield +about 5-10 DMIPS before adding instructions runs out of steam. +
    +If you need to get ca. 20-50 DMIPS out of the ZPU you will have to +write a heavily pipelined architecture with caches(if you are running +against DRAM). This is *tricky*, but some proof of concept work was +done to show 20 DMIPS w/the ZPU(the actual result was discarded since +it was not complete and contained fatal flaws). +

    +Achieving above 50-100 DMIPS with the current ZPU architecture is probably +a non-starter and a more conventional RISC design makes more sense here. +

    +The unique advantages of the ZPU is size in terms of HDL & code size.

    About zpu_core.vhd

    The zpu_core.vhd has a single port memory interface. All data, code and IO is -- cgit v1.1