Computer Architecture Lab/Summer2006/PitterDeinhart/TCMP2.0

The Concept
The design is based on a 16bit load/store architecture. All instructions need two bytes space. A rather unusal (but already existing in some chips) idea is, that every opcode can be flagged as conditional.

Memory
Instruction and data memory are separated. Instructions are read from ROM data can be read and written to the RAM.

Since all instructions consist of 2 bytes the ROM can adress 128 kilobytes. The instruction pointer counts words. Constants can be loaded from ROM using the LDI* opcodes (see below).

Since RAM can be accessed byte per byte it can store only 64 kilobytes of data.

Registers
There are 16 general purpose registers. Their size is 16 bits and they are called ax, bx, cx, .. px.

Additionally there exists the 1 bit sized conditional flag (CF).

Conditionals
A conditional bit is added to each instruction. If this bit is 1, then the instruction is considered conditional and is only executed if conditional flag is set. The conditional flag can be modified by CMP* or by calling clear conditional flag (CCF). Those can be conditional instructions, too.

Load & Store
''MS: you will need an indierct load and store (LD (ax), ST (AX)). Without'' ''those instructions you will be VERY limited. Then you will think again'' about a register size which is less than the address size - one of the big issues in the 8086..80286 (the segementation was a real pain) ;-)

The heart of our load/store architecture are these two commands. Actually one could argue that they are 32 commands, because they have the register number hardcoded into the opcode.

LD	   loads a word from RAM into a register ST	   stores a word from a register into the RAM LDIL,LDIH loads an immediate byte from ROM into a register

Comparison
All comparison instructions set the conditional flag as a result.

CMPEQ	 tests if two registers are equal CMPNE	 tests if two registers are not equal CMPGT	 tests if register 1 > register 2 CMPLT	 tests if register 1 < register 2 CMPEZ	 tests if register is equal to zero CMPNZ	 tests if register is not equal to zero

''Is there any way to set the conditional flag when register 1 <= register 2 ? Or do we need a CMPLE ?''

Is there any difference at all between GMPGT R1, R2 or GMPLT R2, R1 ?

Branching
JMP	 loads a registers value into the instruction pointer

ALU operations
ADD	 adds registers 1 to register 2 ADDI	 adds immediate value to register SUB	 subtracts registers 1 to register 2 SUBI	 subtracts immediate value from register AND	 bitwise locigally ands register 1 to register 2 OR	 bitwise locigally ors register 1 to register 2 XOR	 bitwise locigally xors register 1 to register 2 SHL	 shifts the register specified bits left SHR	 shifts the register specified bits right NOT	 bitwise logically nots register

Others
CCF	 clears the conditional flag NOP	 null operation, justs waits a cycle

Instruction Set Encoding
a .. ALU operation number c .. conditional flag (CF) i .. immediate value s .. source register number d .. destination register number m .. memory pointer register number j .. jump pointer register number _ .. don't care

ALUOPS c1aaaaaa ssssdddd

LDIL  c000iiii iiiidddd LDIH  c001iiii iiiidddd

LD    c0100000 mmmmdddd ST    c0100001 mmmmdddd

JMP   c0100010 ____jjjj

CCF   c0100110 ________ NOP   c0100111 ________

optional and reserved for future use:

LOOP  c0101iii iiiiiiii LDX   c011iiii mmmmdddd

ALU ops
operations that do not modify CF: 0xxxxx

000000 ADD 000001 SUB 000010 AND 000011 OR 000100 XOR 000101 NOT 000110 SHL 000111 SHR 001000 ASR

operations that do modify CF: 1xxxxx

100000 CMPEQ 100001 CMPNE 100010 CMPGT 100011 CMPLT 100100 CMPEZ (d = dont care) 100101 CMPNZ (d = dont care)

The Assembler and Simulator
The assembler/simulator package can be downloaded from http://www.nix.at/sw/tcmp/

Note that only versions >= 0.1 only support TCMP2.0 while versions 0.0.* only support TCMP1. The version that represents the state at the end of the computer architecture course is 0.2.4.

The pacakge consists of the assembler and 3 different simulator programs, which are all described below. Read the INSTALL file in the package for installation hints/requirements.

The package is written in Objective Caml, wich is a really greate computer language. It will generate byte code and object code executables (if supported on your architecture). So if you find an executable called e.g. asm.opt you may call it instead of just asm and it will do the same, but a lot faster.

The assembler: tcmp/asm
The assembler can output text (to verify parser and some calculations), binary output (mainly used for simulation) and vhdl code (can be used as or in a ROM implementation).

When u call it with no arguments or with -h it will give you instruction how to use it: asm: usage: ./asm (-b|-r|-t) -b binary output (better not to stdout..) -r vhdl rom code output -t asm text output (to verify parser)

Line Layout
The assembler layout is similar to nasm, its lines consist of up to three parts:

label:   instruction operands        ; comment

All three components are optional. Operands are separated by a ','.

The main difference is that there is an optional '?' that can be written directly before the instruction. That marks the instruction to be a conditional one.

Mnemonics
The mnemonics are lower case variants of the instructions.

There are some macro like mnemonics that will expand to several instructions when used:

jump reg,label  ; this will jmp to label using reg for address generation

Example Program 1: Blinking around
This little example program blinks with the boards LED.

;; blink led ; begin: ;; counter delta ldil cx,1 ldih cx,0 ;; high word ldil bx,70    ; change this to adjust frequency ldih bx,0 bigloop: ;; low word ldil ax,0x20 ldih ax,0xa1 smallloop: sub cx,ax cmpnz ax       ?jump (fx),smallloop ;       sub cx,bx cmpnz bx       ?jump (fx),bigloop doblink: not ox,ox jump (fx),begin

Example Program 2: Blinking around with changing frequency
This is a modification to the previous example, with variable frequency.

;; blink led ;       ;; counter delta ldil cx,1 ldih cx,0 begin: ;; meta freq ldil ex,70 ldih ex,0 metaloop: ;; high word ldil bx,0    ; change this to adjust frequency ldih bx,0 bigloop: ;; low word ldil ax,0x20 ldih ax,0xa1 smallloop: sub cx,ax cmpnz ax       ?jump (fx),smallloop ;       add cx,bx cmpne bx,ex ?jump (fx),bigloop doblink: not ox,ox sub cx,ex cmpnz ex       ?jump (fx),metaloop jump (fx),begin

Example Program 3: Instruction tester
Not really useful, but it covers all instructions.

; comment ldil ax,10             ;  low byte of ax := 10 nop ldih ax,0              ;  high byte of ax := 10 st ax,(bx)             ;  Memory[ax] := bx        ld ax,(bx)              ;  ax := Memory[bx] ccof jump (fx),l1           ;  fx := AddressOf(l1); ip := fx        jmp (ax)                ;  ip := ax        ccof                    ;  c := 0 nop                    ;  no operation add ax,bx              ;  bx := ax + bx        sub ax,bx               ;  bx := ax - bx        and ax,bx               ;  bx := ax and bx        not ax,bx               ;  bc : = not ax        or ax,bx                ;  bx := ax or bx        xor ax,bx               ;  bx := ax xor bx        shl ax                  ;  ax := ax shl 1 shr bx                 ;  bx := ax shr 1 asr cx                 ;  cx := cx asr 1 cmpeq ax,bx            ;  c := ax == bx        cmpne ax,bx             ;  c := ax != bx        cmpgt ax,bx             ;  c := ax > bx        cmplt ax,bx             ;  c := bx < ax        cmpez ax                ;  c := ax == 0 cmpnz ax               ;  c := ax != 0 l1: nop ?not bx       cmpez ax l2: ccof l3: l3b:  not ax        sub cx,ax jump (fx),l2 l4: jump (fx),l4

The simulator: tcmp/sim
This simulator takes an assembler program and simulates it instruction by instruction. You can optionally specify how much cycles it will simluate. It has its own little help screen, too:

./sim: usage: ./sim []

If you simulate the first instruction of blink.asm you will get this output:

./sim blink.bin 1 | awk '{ printf " ";print}' registers:    ax=00 cx=00 ex=00 gx=00 ix=00 kx=00 mx=00 ox=00 ip=0 bx=00 dx=00 fx=00 hx=00 jx=00 lx=00 nx=00 px=00 cof=false RAM: 0000: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 001a:  0000 0000 0000 ROM: 0000: 0012 2700 1002 0011 2700 1001 0010 2700 1000 2700 4120 2700 6500 000d:  2700 80a5 a700 9005 a700 a205 2700 4121 2700 6510 2700 8065 a700 001a: 9005 a700 a205 2700 45ee 0005 2700 1005 2700 2205 executing instruction: -> ldil        cx,1     (0000000000010010=0x0012) registers:    ax=00 cx=01 ex=00 gx=00 ix=00 kx=00 mx=00 ox=00 ip=1 bx=00 dx=00 fx=00 hx=00 jx=00 lx=00 nx=00 px=00 cof=false RAM: 0000: 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 001a:  0000 0000 0000 ROM: 0000: 0012 2700 1002 0011 2700 1001 0010 2700 1000 2700 4120 2700 6500 000d:  2700 80a5 a700 9005 a700 a205 2700 4121 2700 6510 2700 8065 a700 001a: 9005 a700 a205 2700 45ee 0005 2700 1005 2700 2205

Please note that you won't see any hazards here, as this simulator knows nothing about pipelining at all. This can be useful to verify the correctness of the pipeline engine (or the builtin hazard prevention freature of the assembler).

The pipelined simulator: tcmp/psim
This is another simulator, which is completly different from the previous one. While it should give the same results it simulates the complete TCMP pipeline. So it will corrupt registers or RAM if there are hazards. In short: It will (try to) act like the processor in hardware.

While its internals are completly different, the usage is equal to the non pipelined simulator:

./psim: usage: ./psim []

If we simlate 5 clock cycles of blink.asm we get:

RST IF: ip=0001 ID: cof=false rom[0000]=0012 that is ldil    cx,1 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=00 s23ldiop=0 reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000 ox=0000 bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000 px=0000 WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=0000 s34ldiout=0000 CLK IF: ip=0002 ID: cof=false rom[0001]=2700 that is nop EX: s23op=LDI s23rda=2 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=01 s23ldiop=0 reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000 ox=0000 bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000 px=0000 WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=0000 s34ldiout=0000 CLK IF: ip=0003 ID: cof=false rom[0002]=1002 that is ldih    cx,0 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=27 s23ldiv=70 s23ldiop=0 reg.file: ax=0000 cx=0000 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000 ox=0000 bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000 px=0000 WB: s34op=LDI s34rda=2 ram[0000]=0000 s34aluout=0000 s34ldiout=0001 CLK IF: ip=0004 ID: cof=false rom[0003]=0011 that is ldil    bx,1 EX: s23op=LDI s23rda=2 rd1=0000 rd2=0001 s23aluops=10 s23ldiv=00 s23ldiop=1 reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000 ox=0000 bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000 px=0000 WB: s34op=NOP s34rda=0 ram[0000]=0000 s34aluout=3039 s34ldiout=0070 CLK IF: ip=0005 ID: cof=false rom[0004]=2700 that is nop EX: s23op=LDI s23rda=1 rd1=0000 rd2=0000 s23aluops=0 s23ldiv=01 s23ldiop=0 reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000 ox=0000 bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000 px=0000 WB: s34op=LDI s34rda=2 ram[0000]=0000 s34aluout=3039 s34ldiout=0001 CLK IF: ip=0006 ID: cof=false rom[0005]=1001 that is ldih    bx,0 EX: s23op=NOP s23rda=0 rd1=0000 rd2=0000 s23aluops=27 s23ldiv=70 s23ldiop=0 reg.file: ax=0000 cx=0001 ex=0000 gx=0000 ix=0000 kx=0000 mx=0000 ox=0000 bx=0000 dx=0000 fx=0000 hx=0000 jx=0000 lx=0000 nx=0000 px=0000 WB: s34op=LDI s34rda=1 ram[0000]=0000 s34aluout=0000 s34ldiout=0001

RST and CLK are the reset and the clock signal.

The graphical pipelined simulator: tcmp/gpsim
This is a graphical frontend to the pipelined simulator (psim). It accepts no option, instead it will show you a file dialog to choose some binary code to load.

The interface is quite simple and self explaining. Just take a look at this screenshot:



Pipeline Architecture
The TCMP2.0 pipeline consists of 4 stages:


 * Instruction fetch
 * Instruction decode
 * Execute
 * Write back

If you have not by now, please take a look at the picture in the gpsim section to get an overview. The line colors are showing from which stage the signals are coming (Cyan, Bue, Red, Green).

At this time there is no processor builtin hazard prevention, but a smart nop generateion at assembler level. Due to our not too unclever design, there is no need to delay the processor for more than one nop at once. Of course one could implement e.g. bypassing but to our luck that was not in the mandatory scope of computer architecture.

But maybe by the time you read this, it is already implemented in the simulator or even at vhdl level. So be sure to check the download packages!

VHDL Implementation
The VHDL implementation was generated by first creating the black boxes like ALU and register file. Then we basically translated the code from the pipelined simulator (can be found in pipeline.ml) into VHDL. Once that seemed to be complete we simulated the whole thing in Modelsim.

After a few fixes, mainly dangling signals, we downloaded the processor to the FPGA. To our suprise it nearly instantly worked. Maybe it was a good idea to simulate a lot with tcmp/*sim and Modelsim. On the other side, maybe we just had lots of luck :)

The full vhdl files can be downloaded here: [http://www.nix.at/tcmp2/vhdl.zip vhdl.zip].

Additionally a couple of testbenches were created to simulate some building blocks of the processor in ModelSim. The following screenshot describes a simulation of the processor running the blink2.asm program.