Computer Architecture Lab/WS2007/Project -1 Lab4

Lucky Processor
Thank you for choosing the "Lucky, das Ding aus dem See" Processor!

This especially for the Cyclone II - EP1C12Q240C8N FPGA designed Processor will revolutionize the gambling-world.

Features

 * 32-bit RISC architecture
 * 32-bit fixed instruction length
 * 24 instructions
 * 32 32-bit registers
 * Groundbreaking special function (Hardware implemented Random)
 * 4-stage in-order pipeline
 * Simple UART communication
 * Program and data-memory are separated (Harvard)

Instruction Set Architecture
We chose the RISC-common Register-Register (or load/store) Instruction Set Architecure with a fixed length of 32-bit per instruction.

Registers
The processor has 32 32-bit registers, which are mapped into the internal FPGA memory due to performance and size reasons.

Flags
Our architecture can set 4 flags:
 * Negative
 * Overflow
 * Zero
 * Random

The first three flags are set immediately in the ALU after an instruction was executed, the Random flag can change its status every cycle.

Instructions
There are:
 * 13 Arithmetical/Logical Instructions
 * 3 Load/Store Instructions
 * 8 Branch and Jump Instructions

The Instruction Set is described in detail here.

Addressing Modes
The Lucky Processor provides two types of addressing:
 * Register
 * Immediate

Special Instructions
Because of the integrated Randomizer that provides a new Random value and RandomFlag every cycle, we have the special function Branch on Random (BOR) that allows the influence of the program behaviour in a (controlled) random way! i.e. for Casino games this feature is a good tool. If you need a random 32-bit value you just haev to read the content of Register $1E that contains a new random value every cycle.

Randomizer
This is the heart of the provided special function of our Lucky Processor. It has an integrated Randomnumbergenerator that uses the TLV3502 rail-to-rail comparator on the dspio-boeard, a modified SigmaDelta AD Converter (from Opencores) and a LFSR.

A 32-bit random value is stored in the register $1E and updated every cycle. Additionally the Random Flag is updated every cycle and allows, in addition with the special function BOR (Branch On Random-bit set), a new, unpredictable program behaviour everytime you run the same program.

To read the Random-Register and view the output: ADD("1","0","1E"); DEBUG_SEND("1",&pc);

The Branch on Random function: BOR(LABEL);

A testrun including BOR showed, that the distribution of 1s and 0s in the Random Flag is good: 26116 vs. 26078

sourcefile for the test

Memory Access
The internal FPGA memory is used for I/O access and the RAM.

Mapping for I/O access:

The address space for our RAM is from 0x00F to 0xFFF

UART Communication
The Lucky Processor allows a simple way to communicate with its internal UART. The easiest way to send something is to use the function: UART_SEND_STRING("Hello World! ",&pc); This codeline will result in writing "Hello World! " to your HyperTerminal interface.

For more details for the common UART:

Send
If UART txd ready to send is set, it indicates that a new byte, which is in UART set txd byte, can be sent. UART txd send now triggers sending. When it is set (1) you can send, if it is cleared (0) you can't send. If you write a 1 to UART txd sed now while it still send (is set to 1) it will have no effect beacuse it has to be cleared before you can send again.

Receive
If UART rxd has new value is set, it indicates that UART get rxd byte has a new value to read. If UART rxd has new value is cleared UART get rxd byte has bean read at least once.

If you read from a memory location that is dedicated to write e.g. read from UART set txd byte, it will return 0 not the value it really contains. If you write to a location that is dedicated to read it will have no effect.

Pipeline
The 4 Stages in the Lucky Processor are the following: Here the Instructions are fetched from the ROM into the instruction registers and the Program Counter is incremented.
 * Instruction Fetch

This stage decodes the Instructions into ops, operands and immediate values.
 * Instruction Decode

Fetch special- and General-Purpose-Register values.
 * Register Fetch

Here the Instructions are executed and the results are written back to the Registers.
 * Execute/Writeback

Hazard Strategies

 * Data hazards: In our architecure RAW (Read after Write) data hazards are avoided by forwarding, and because we have an in-order pipeline WAW and WAR cannot occur.
 * Control hazards: Register contol hazards cannot occur because we use a 3-way memory for our registers.
 * Conditional hazards: We chose the "predict Branch not taken" approach to deal with these types of hazards. In case of a misspredict the instructions of the next 4 cycles are dumped.

Source
The source of our assembler can be found here

Syntax
The assembler syntax including instruction samples can be found here

The following instructions are implemented using the processors instructions, they are not part of the processors isa.

Pseudo Stack
The register 0x1D is used as a stack pointer. It is initialized to the last memory adress (0x1FF).

Pseudo Call
A Call is translated into 3 instructions, it branches into a special call handler which takes 23 instructions to push the local variables (reg 0x01-reg 0x08) and the return adress on the pseudo stack (the stackpointer is decremented). The new stack content after a call is: [lower adress] [return adress] [r0x08] [r0x07] ... [r0x01] [oldStackContent] [0x1FF] The stack pointer points to the [return adress].

Pseudo Return
A Return is translated into a single instruction, it branches into a special return handler which takes 20 instructions to pop the local variables (reg 0x01-reg 0x08) and the return adress from the pseudo stack (the stackpointer is incremented). The new stack content after a call is: [br] [lower adress] [oldStackContent] [0x1FF] After the return the stack pointer points to the [return adress] of the oldStackContent.

Example Programs
Example program files can be found here

Next Steps
Because of the easy instruction set the next version of the processor would have no decoder stage.