Computer Architecture Lab/WS2006/Group6/ISA

Memory
Program, data and stack memories occupy the same memory space. The total addressable memory size is 64 KB.

Program memory
program can be located anywhere in memory. Jump, branch and call instructions use 16-bit addresses, i.e. they can be used to jump/branch anywhere within 64 KB. All jump/branch instructions use absolute addressing

Data memory
The processor always uses 16-bit addresses so that data can be placed anywhere.

Stack memory
is limited only by the size of memory. Stack grows downward.

Interrupts
The processor has 5 interrupts – INTR, RST5.5 ,RST6.5 RST7.5 and Trap.

I/O ports
256 Input ports, 256 Output ports

Accumulator
or A register is an 8-bit register used for arithmetic, logic, I/O and load/store operations.

Flag
8-bit register containing 5 1-bit flags:  Sign ,Zero, Parity, Carry set , or borrow during subtraction/comparison.  

General registers
8-bit B and 8-bit C registers,8-bit D and 8-bit E registers,8-bit H and 8-bit L registers

Stack pointer
is a 16 bit register. This register is always incremented/decremented by 2.

Program counter
is a 16-bit register.

Instruction Set
Data moving instructions,Arithmetic - add, subtract, increment and decrement,Logic - AND, OR, XOR and rotate,Control transfer - conditional, unconditional, call subroutine, return from subroutine and restarts,Input/Output instructions,Other - setting/clearing flag bits, enabling/disabling interrupts, stack operations, etc.

Addressing modes
Register references the data in a register or in a register pair. Register indirect - instruction specifies register pair containing address, where the data is located. Direct Immediate - 8 or 16-bit data.

Architectural Features of the R8000
The R8000 is a 64bit RISC microprocessor with strong emphasis on floating point performance, that is spcifically designed for supercomputing applications. It implements the MIPS IV instruction set architecture (ISA). The MIPS R8000 processor is designed to deliver extremely high floating point performance. The R8000 key features include:  Multi-component chip set consisting of an integer unit (IU), floating-point unit (FPU), tag RAMS, and 2 MB of data streaming cache Four-way superscalar architecture, six operations per clock cycle True 64-bit microprocessor with 64-bit integer and floating-point operations, registers, and virtual addresses 3.3-volt technology 16 KB of instruction cache (I-cache) in IU, 16 KB of dual-ported data cache (D-cache) in IU, 1K entries of branch prediction cache Memory Management Unit (MMU) in IU contains a 384-entry, dual-ported, three-way set associative Translation Lookaside Buffer (TLB)</li> <li>ANSI/IEEE-754 standard floating-point coprocessor with imprecise interrupts</li> <li>32 doubleword (64-bit) general-purpose registers in IU and 32 floating-point registers in FPU</li> <li>128-bit data bus and a separate 40-bit address bus that can access up to 1TB of physical memory</li> <li>Upward compatibility with earlier 32-bit and 64-bit MIPS microprocessors </li> </ul>

Instruction set
Instructions are fetched from an on-chip 16-Kbytes instruction cache (Instruction Cache). Four instructions (128 bits) are fetched per cycle.There are three categories of new instructions: fused multiply-add, register+register addressing mode, and conditional moves.The fused multiply-add instructions - three input operands and performs a multiplication followed by an addition with no intermediate rounding. The register+register addressing mode is supported for floating-point loads and stores to enhance the performance of array accesses with arbitrary strides.Integer and memory instructions get their operands from a 13 port register file (Integer Register File). The R8000 processor includes the following functional units, Integer and memory instructions: <ul> <li>2X load/store </li> <li>2X ALU </li> <li>1X shifter </li> <li>1X integer multiply/divide </li> <li>2X FPU </li> </ul> The FPU implements the following operations: <ul> <li>MADD (1 cycle latency Multiply-Add; ex: a = a + b * c) </li> <li>Divide</li> <li>Square Root </li> <li>Reciprocal (i.e. 1/x) </li> <li>Reciprocal Square Root </li> </ul>

Memory
The characteristics of the R8000 memory subsystem - number of ports, sizes and algorithms of the caches, tag RAM, and buffering schemes - complement the high-performance computational capabilities of the R8000 and ensure that memory bandwidth demands from the floating-point and integer units are met.

Pipeline
The Fetch stage - accesses the instruction cache and the branch prediction cache (to be explained later). The Decode stage - makes dispatch decisions based on register scoreboarding and resource reservations, and also reads the register file. The Address stage - computes the effective addresses of loads and stores. The Execute stage - evaluates the ALU operation, accesses the data cache and TLB, resolves branches and handles all exceptions. Finally, the Writeback stage - updates the register file.

EXTERNAL CACHE PIPELINE
There are five stages in the external cache pipeline. Addresses are sent from the R8000 to the tag ram in the first stage. The tags are looked up and hit/miss information is encoded in the second stage. The third stage is used for chip crossing from the tag ram to the data rams. The SSRAM is accessed internally within the chip in the forth stage. Finally data is sent back to the R8000 and R8010 in the fifth stage.

* * * I wandted here to add a picture, but I couldn't register. I also couldn't add a external Picture. img src="stud3.tuwien.ac.at/~e0248591/r8000.PNG" ***

AMD Athlon64
It is designed by Advanced Micro Devices (AMD), who have since renamed it AMD64. The AMD64 architecture is a simple yet powerful 64-bit, backward-compatible extension of the industry-standard (legacy) x86 architecture.

Architecture: Pipeline
Pipeline-Stages <ul> <li>20 -. 31 Stages</li> <li>Fetch-Execute <ul><li>it last 20 (31) tacts</li></ul></li> <li>Pipeline hard to fill</li> </ul> <ul> <li>TC Nxt = trace cache next instruction pointer</li> <li>TC Fectch = trace cache fetch</li> <li>Rename = register renaming</li> <li>Que = micro-op queuing</li> <li>Sch = micro-op scheduling</li> <li>Disp = dispatch</li> <li>RF = register file</li> <li>Ex = execute</li> <li>Flgs = flags</li> <li>BrCk = branch check</li> </ul>

Instruktions-Set
IA-32 (x86) Instruktions-set <ul> <li>32-Bit instructions; 32-Bit Adresses</li> <li>Variable instruction lenght: 1-16 Bytes</li> <li>no Load-Store architecture <ul><li>operand may be on memory</li><li>instructions can change the memory</li></ul> </li> <li>2 Adress-instructions <ul><li>target is also the source ADD EAX, [1234]</li></ul> </li> <li>branch instructions <ul><li>check statusbits in Flag-Register</li></ul> </li> <li>different levels <ul><li>Level 0: operating system</li><li>Level 3: User-Programm</li></ul> </li> </ul>

Instructions
x86-Instructions <ul> <li>Data transfer instructions (MOV)</li> <li>binary arithmetic (ADD, SUB)</li> <li>logical instructions (AND, OR)</li> <li>shift and rotate (ROR, SAR)</li> <li>bit and byte instructions (BTS, SETE)</li> <li>control transfer instructions (JMP, LOOP, CALL)</li> <li>string instructions (MOVS, SCAS)</li> <li>flag control instructions (STD, STI)</li> <li>segment register instructions (LDS)</li> <li>miscellaneous instructions (LEA, NOP, CPUID)</li> </ul>

Instructions format
Complicated and complex size <ul> <li>1-Byte Opcodes<ul><li>originally 1-Byte Opcodes</li></ul></li> <li>Prefixes possible<ul><li>Other words width  or iteration</li></ul></li> <li>Instructions decoding is difficult </li> <li>ModR/M, SIB, Displacement, Immediate:<ul><li>Operands addressing</li><li>24 Adress types </li><li>Offset = Base + Index*Scale + Displacement</li></ul></li>

</ul>

Register
ExX-Register <ul> <li>Mixture General-Purpose and Special-Purpose Register</li> <li>EAX: Accumulator</li> <li>EBX: Base-Register</li> <li>ECX: Count-Register <ul><li>Special: Loops</li></ul> </li> <li>EDX: Data-Register <ul><li>Special: multiplication / division </li></ul>  </li>

</ul> Furthermore <ul> <li>EBP: Base-Pointer <ul><li>points out at the Stacks start </li></ul> </li> <li>ESI, EDI: Source- und Destination-Register</li> <li>ESP: Stack-Pointer</li> <li>EFLAGS: Flags</li>

</ul>

Adressing mode
Addressing: the choise of an instruction operands –	where are the operands stored Immediate addressing: MOV EAX, 1234 – specifying the data direct instead of adress Direct addressing: MOV EAX, [1234] –	detail of (32-Bit) memory address Register-Adressing: MOV EAX, EBX – Register contain operand Register-indirect adressing: MOV EAX, [EBX] – Register contain the adress of operands – Pointer Indexed addressing: MOV EAX, table[EBX] – Basicadress + Offset Register Indexed Register- indirect addressing with displacement – IA-32 speciality: MOV EAX, table[EBX*4 + 1]