Index

Getting started

The ZPU comes with a few simulation examples.

Start with VHDL synthesis examples

Introduction

The ZPU is a zero operand, or stack based CPU. The opcodes have a fixed width of 8 bits.

Example:

IM 5 ; push 5 onto the stack LOADSP 20 ; push value at memory location SP+20 ADD ; pop 2 values on the stack and push the result
As can be seen, a lot of information is packed into the 8 bits, e.g. the IM instruction pushes a 7 bit signed integer onto the stack.

The choice of opcodes is intimately tied to the GCC toolchain capabilities.

/* simple program showing some interesting qualities of the ZPU toolchain */ void bar(int); int j; void foo(int a, int b, int c) { a++; b+=a; j=c; bar(b); } foo: loadsp 4 ; a is at memory location SP+4 im 1 add loadsp 12 ; b is now at memory location SP+12 add loadsp 16 ; c is now at memory location SP+16 im 24 ; «j» is at absolute memory location 24. ; Notice how the ZPU toolchain is using link-time relaxation ; to squeeze the address into a single no-op store im 22 ; the fn bar is at address 22 call im 12 return ; 12 bytes of arguments + return from fn

Instruction set

Only the base instructions are implemented in the architecture. More advanced instructions, like ASHIFTLEFT are emulated in the illegal instruction vector. All operations are 32 bit wide.
NameOpcodeDescriptionDefinition
BREAKPOINT 00000000 The debugger sets a memory location to this value to set a breakpoint. Once a JTAG-like debugger interface is added, it will be convenient to be able to distinguish between a breakpoint and an illegal(possibly emulated) instruction. No effect on registers
IM 1xxx xxxx Pushes 7 bit sign extended integer and sets the a «instruction decode interrupt mask» flag(IDIM).

If the IDIM flag is already set, this instruction shifts the value on the stack left by 7 bits and stores the 7 bit immediate value into the lower 7 bits.

Unless an instruction is listed as treating the IDIM flag specially, it should be assumed to clear the IDIM flag.

To push a 14 bit integer onto the stack, use two consequtive IM instructions.

If multiple immediate integers are to be pushed onto the stack, they must be interleaved with another instruction, typically NOP.

pc <= pc + 1
idim <= 1
if (idim=0) then
sp <= sp - 1;
for i in wordSize-1 downto 7 loop
mem(sp)(i) <= opcode(6)
end loop
mem(sp)(6 downto 0) <= opcode(6 downto 0)
else
mem(sp)(wordSize-1 downto 7) <= mem(sp)(wordSize-8 downto 0)
mem(sp)(6 downto 0) <= opcode(6 downto 0)
end if
STORESP 010x xxxx Pop value off stack and store it in the SP+xxxxx*4 memory location, where xxxxx is a positive integer.
LOADSP 011x xxxx Push value of memory location SP+xxxxx*4, where xxxxx is a positive integer, onto stack.
ADDSP 0001 xxxx Add value of memory location SP+xxxx*4 to value on top of stack.
EMULATE 001x xxxx Push PC to stack and set PC to 0x0+xxxxx*32. This is used to emulate opcodes. See zpupgk.vhd for list of emulate opcode values used. zpu_core.vhd contains reference implementations of these instructions rather than letting the ZPU execute the EMULATE instruction

One way to improve performance of the ZPU is to implement some of the EMULATE instructions.

PUSHPC emulated Pushes program counter onto the stack.
POPPC 0000 0100 Pops address off stack and sets PC
LOAD 0000 1000 Pops address stored on stack and loads the value of that address onto stack.

Bit 0 and 1 of address are always treated as 0(i.e. ignored) by the HDL implementations and C code is guaranteed by the programming model never to use 32 bit LOAD on non-32 bit aligned addresses(i.e. if a program does this, then it has a bug).

STORE 0000 1100 Pops address, then value from stack and stores the value into the memory location of the address.

Bit 0 and 1 of address are always treated as 0

PUSHSP 0000 0010 Pushes stack pointer.
POPSP 0000 1101 Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads.
ADD 0000 0101 Pops two values on stack adds them and pushes the result
AND 0000 0110 Pops two values off the stack and does a bitwise-and & pushes the result onto the stack
OR 0000 0111 Pops two integers, does a bitwise or and pushes result
NOT 0000 1001 Bitwise inverse of value on stack
FLIP 0000 1010 Reverses the bit order of the value on the stack, i.e. abc->cba, 100->001, 110->011, etc.

The raison d'etre for this instruction is mainly to emulate other instructions.

NOP 0000 1011 No operation, clears IDIM flag as side effect, i.e. used between two consequtive IM instructions to push two values onto the stack.
PUSHSPADD 61 a=sp;
b=popIntStack()*4;
pushIntStack(a+b);
POPPCREL 57 setPc(popIntStack()+getPc());
SUB 49 int a=popIntStack();
int b=popIntStack();
pushIntStack(b-a);
XOR 50 pushIntStack(popIntStack() ^ popIntStack());
LOADB 51 8 bit load instruction. Really only here for compatibility with C programming model. Also it has a big impact on DMIPS test.

pushIntStack(cpuReadByte(popIntStack())&0xff);

STOREB 52 8 bit store instruction. Really only here for compatibility with C programming model. Also it has a big impact on DMIPS test.

addr = popIntStack();
val = popIntStack();
cpuWriteByte(addr, val);

LOADH 34 16 bit load instruction. Really only here for compatibility with C programming model.

pushIntStack(cpuReadWord(popIntStack()));

STOREH 35 16 bit store instruction. Really only here for compatibility with C programming model.

addr = popIntStack();
val = popIntStack();
cpuWriteWord(addr, val);

LESSTHAN 36 Signed comparison
a = popIntStack();
b = popIntStack();
pushIntStack((a < b) ? 1 : 0);
LESSTHANOREQUAL 37 Signed comparison
a = popIntStack();
b = popIntStack();
pushIntStack((a <= b) ? 1 : 0);
ULESSTHAN 37 Unsigned comparison
long a;//long is here 64 bit signed integer
long b;
a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff
b = ((long) popIntStack()) & INTMASK;
pushIntStack((a < b) ? 1 : 0);
ULESSTHANOREQUAL 39 Unsigned comparison
long a;//long is here 64 bit signed integer
long b;
a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff
b = ((long) popIntStack()) & INTMASK;
pushIntStack((a <= b) ? 1 : 0);
EQBRANCH 55 int compare;
int target;
target = popIntStack() + pc;
compare = popIntStack();
if (compare == 0)
{
setPc(target);
} else
{
setPc(pc + 1);
}
NEQBRANCH 56 int compare;
int target;
target = popIntStack() + pc;
compare = popIntStack();
if (compare != 0)
{
setPc(target);
} else
{
setPc(pc + 1);
}
MULT 41 Signed 32 bit multiply
pushIntStack(popIntStack() * popIntStack());
DIV 53 Signed 32 bit integer divide.
a = popIntStack();
b = popIntStack();
if (b == 0)
{
// undefined
} pushIntStack(a / b);
MOD 54 Signed 32 bit integer modulo.
a = popIntStack();
b = popIntStack();
if (b == 0)
{
// undefined
}
pushIntStack(a % b);
LSHIFTRIGHT 42 unsigned shift right.
long shift;
long valX;
int t;
shift = ((long) popIntStack()) & INTMASK;
valX = ((long) popIntStack()) & INTMASK;
t = (int) (valX >> (shift & 0x3f));
pushIntStack(t);
ASHIFTLEFT 43 arithmetic(signed) shift left.
long shift;
long valX;
shift = ((long) popIntStack()) & INTMASK;
valX = ((long) popIntStack()) & INTMASK;
int t = (int) (valX << (shift & 0x3f));
pushIntStack(t);
ASHIFTRIGHT 43 arithmetic(signed) shift left.
long shift;
int valX;
shift = ((long) popIntStack()) & INTMASK;
valX = popIntStack();
int t = valX >> (shift & 0x3f);
pushIntStack(t);
CALL 45 call procedure.

int address = pop();
push(pc + 1);
setPc(address);
CALLPCREL 63 call procedure pc relative

int address = pop();
push(pc + 1);
setPc(address+pc);
EQ 46 pushIntStack((popIntStack() == popIntStack()) ? 1 : 0);
NEQ 48 pushIntStack((popIntStack() != popIntStack()) ? 1 : 0);
NEG 47 pushIntStack(-popIntStack());

Implementing your own ZPU

One of the neat things about the ZPU is that the instruction set and architecture is very small and it is easy to implement a ZPU from scratch or modify the existing ZPU implementations.

Implementing a ZPU can be done without understanding the toolchain in detail, i.e. using exclusively HDL skills and only a rudimentary understanding of standard GCC/GDB usage is sufficient.

A few tips:

Vectors

AddressNameDescription
0x000 Reset 1.When the ZPU boots, this is the first instruction to be executed.

2.The stack pointer is initialised to maximum RAM address

0x020 Interrupt This is the entry point for interrupts.
0x040- Emulated instructions Emulated opcode 34. Note that opcode 32 and opcode 33 are not normally used to emulate instructions as these memory addresses are already used by boot vector, GCC registers and the interrupt vector.

Phi memory map

The ZPU architecture does not define a memory map as such, but the GCC + libgloss + ecos hal library uses the memory map below.

Address

Type

Name

Description

0x080A0000

Write

ZPU enable

Bit [31:1] Not used

Bit [0] Enable ZPU operations

0 ZPU is held in Idle mode

1 ZPU running

0x080A000C

Read/

Write

ZPU UART to ARM7 TX

NOTE! ZPU side

Bit [31:9] Not used

Bit [8] TX buffer ready (valid on ready)

0 TX buffer not ready (full)

1 TX buffer ready

Bit [7:0] TX byte (valid on write)

0x080A0010

Read

ZPU UART to ARM7 RX

NOTE! ZPU side

Bit [31:9] Not used

Bit [8] RX buffer data valid

0 TX buffer not valid

1 TX buffer valid

Bit [7:0] RX byte (when valid)

0x080A0014

Read/

Write

Counter(1)

Bit [0] Reset counter (valid for write)

0 N/A

1 Reset counter

Bit [1] Sample counter (valid for write)

0 N/A

1 Sample counter

Bit [31:0] Counter bit 31:0

0x080A0018

Read

Counter(2)

Bit [31:0] Counter bit 63:32

0x080A0020

Read / Write

Global_Interrupt_mask

Bit [31:1] Not used

Bit [0] Global intr. Mask

0 Interrupts enabled

1 Interrupts disabled

0x080A0024

Write

UART_INTERRUPT_ENABLE

Bit [31:1] Not used

Bit [0] UART RX interrupt enable

0 Interrupt disable

1 Interrupt enable

0x080A0028

Read

Write

UART_interrupt

Bit [31:1] Not used

Bit [0] UART RX interrupt pending (Read)

0 No interrupt pending

1 Interrupt pending

Bit [0] Clear UART interrupt (Write)

0 N/A

1 Interrupt cleared

0x080A002C

Write

Timer_Interrupt_enable

Bit [31:1] Not used

Bit [0] Timer interrupt enable

0 Interrupt disable

1 Interrupt enable

0x080A0030

Read /

Write

Timer_interrupt

Bit [31:2] Not used

Bit [0] Timer interrupt pending (Read)

0 No interrupt pending

1 Interrupt pending

Bit [1] Reset Timer counter (Write)

0 N/A

1 Timer counter reset

Bit [0] Clear Timer interrupt (Write)

0 N/A

1 Interrupt cleared

0x080A0034

Write

Timer_Period

Bit [31:0] Interrupt period (write)

Number of clock cycles

between timer interrupts

NOTE! The timer will start at Timer_Periode value and count down to zero, and generate an interrupt

.0x080A0038

Read

Timer_Counter

Bit [31:0] Timer counter (read)


















Interrupts

The ZPU supports interrupts.

To trigger an interrupt, the interrupt signal must be asserted. The ZPU does not define any interrupt disabling mechanism, this must be implemented by the interrupt controller and controlled via memory mapped IO.

Interrupts are masked when the IDIM flag is set, i.e. with consequtive IM instructions.

The ZPU has an edge triggered interrupt. As the ZPU notices that the interrupt is asserted, it will execute the interrupt instruction. The interrupt signal must stay asserted until the ZPU acknowledges it.

When the interrupt instruction is executed, the PC will be pushed onto the stack and the PC will be set to the interrupt vector address (0x20).

Note that the GCC compiler requires three registers r0,r1,r2,r3 for some rather uncommon operations. These 32 registers are mapped to memory locations 0x0, 0x4, 0x8, 0xc. The default interrupt vector at address 0x20 will load the value of these memory locations onto the stack, call _zpu_interrupt and restore them.

See zpu/hdl/zpu4/test/interrupt/ for C code and zpu/hdl/example/simzpu_interrupt.do for simulation example.

About zpu_core_small.vhd

The small ZPU implements the minimum instruction set. It is optimized for size and simplicity serving as a reference in both regards.

It uses a BRAM (dual port RAM w/read/write to both ports) as data & code storage and is implemented as a simple state machine.

Essentially it has three states:

  1. Fetch - starts fetch of next instruction
  2. FetchNext - sets up operands for execute cycle
  3. Decode - decodes instruction
  4. Execute - well.. executes instruction
The tricky bit is that there is a tiny bit of interleaving of states since the BRAM takes a cycle to perform a fetch/store. The above is the normal states the ZPU cycles through unless memory fetch, jumps, etc. take place.

About zpu_core.vhd

The zpu_core.vhd has a single port memory interface. All data, code and IO is accessed through this memory interface.

It performs better(despite having less memory bandwidth than zpu_core_small.vhd) since it implements many more instructions.

Next generation ZPU

Based on feedback here is a list of a tenuous "consensus" for the next generation of the ZPU with some tentative ideas on implementation.

The plan is to update zpu_core.vhd and zpu_core_small.vhd as examples/reference, and to open up for innovation in the HDL implementation.

  1. Reduce minimum code size footprint
    1. Modify GCC compiler to be able to emit function calls instead of instructions. E.g instead of issuing MULT, generate function call. This reduces code size overhead for applications that do not use MULT since the microcode does not need to be in place.
    2. Add single entry for unknown instructions. PC and unsupported instruction is pushed onto stack before jumping to unkonwn instruction vector. This makes it possible to write denser microcode for missing instructions.
  2. Add floating point add and mult. FADD & FMULT. Option to generate the instructions from the compiler.
  3. Add some scheme to support custom instructions.
  4. Add support to Zylin Embedded CDT for downloading fully functional ZPU toolchain. The goal is to allow new users to write and simulate simple ZPU programs in in less than an hour.