ARM Instruction

ARM (formerly Advanced RISC Machine and Acorn RISC Machine) is a reduced instruction set computer (RISC) instruction set architecture (ISA). The ARM architecture is the most widely used 32-bit instruction set architecture in numbers produced.

ARM is the most popular instruction set architecture for embedded devices, with more than 3 billion devices per year using ARM. ARM came out the same year as MIPS and followed similar philosophies. Figure 2.31 lists the similarities. The principle difference is that MIPS has more registers and ARM has more addressing modes.

There is a similar core of instruction sets for arithmetic-logical and data transfer instructions for MIPS and ARM, as Figure 2.32 shows.

ARM cores

Architecture	Family
ARMv1	ARM1
ARMv2	ARM2, ARM3
ARMv3	ARM6, ARM7
ARMv4	StrongARM, ARM7TDMI, ARM9TDMI
ARMv5	ARM7EJ, ARM9E, ARM10E, XScale
ARMv6	ARM11, ARM Cortex-M
ARMv7	ARM Cortex-A, ARM Cortex-M, ARM Cortex-R
ARMv8	No cores available yet. Will support 64-bit data and addressing

Example applications of ARM cores

ARM cores are used in a number of products, particularly various PDAs,smartphones, portable media player, digital camera, handheld games console and automotive navigation system.

Since 2005, ARM cores to simulate the human brain because they are very small, very cheap and use little power.

CPU modes

The ARM architecture specifies the following CPU modes. At any moment in time, the CPU can be in only one mode, but it can switch modes due to external events (interrupts) or programmatically.

User mode - The only non-privileged mode.

System mode - The only privileged mode that is not entered by an exception. It can only be entered by executing an instruction that explicitly writes to the mode bits of the CPSR.

Supervisor (svc) mode - A privileged mode entered whenever the CPU is reset or when a SWI instruction is executed.

Abort mode - A privileged mode that is entered whenever a prefetch abort or data abort exception occurs.

Undefined mode - A privileged mode that is entered whenever an undefined instruction exception occurs.

Interrupt mode - A privileged mode that is entered whenever the processor accepts an IRQ interrupt.

Fast Interrupt mode - A privileged mode that is entered whenever the processor accepts an FIQ interrupt.

Hypervisor mode - A hypervisor mode introduced in armv-7a for cortex-A15 processor for providing hardware virtualization support.

Addressing Modes

Figure 2.33 shows the data addressing modes supported by ARM. Unlike MIPS, ARM does not reserve a register to contain 0. Although MIPS has just 3 simple data addressing modes, ARM has 9, including fairly complex calculations. For example, ARM has an addressing mode that can shift 1 register by any amount, add it to the other registers to form the address, and then update 1 register with this new address.

Compare and Conditional Branch

MIPS uses the contents of registers to evaluate conditional branches. ARM uses the traditional 4 condition code bits stored in the program status word: negative, zero, carry, and overflow. They can be set on any arithmetic or logical instruction; unlike earlier architectures, this setting is optional on each instruction. An explicit option leads to fewer problems in a pipelined implementation. ARM uses conditional branches to test condition codes to determine all possible unsigned and signed relations.

CMP subtracts 1 operand from the other and the difference sets the condition codes. Compare negative (CMN) adds 1 operand to the other, and the sum sets the condition codes. TST performs logical AND on the 2 operands to set all condition codes but overflow, while TEQ uses exclusive OR to set the first 3 condition codes.

1 unusual feature of ARM is that every instruction has the option of executing conditionally, depending on the condition codes. Every instruction starts with a 4-bit field that determines whether it will act as a no operation instruction (nop) or as a real instruction, depending on the condition codes. Hence, conditional branches are properly considered as conditionally executing the unconditional branch instruction. Conditional execution allows avoiding a branch to jump over a single instruction. It takes less code space and time to simply conditionally execute 1 instruction.

Figure 2.34 shows the instruction formats for ARM and MIPS. The principal differences are the 4-bit conditional execution field in every instruction and the smaller register field, because ARM has half the number of registers.

Main features of the ARM Instruction Set

* All instructions are 32 bits long.

* Most instructions execute in a single cycle.

* Every instruction can be conditionally executed.

* A load/store architecture

• Data processing instructions act only on registers

– Three operand format

– Combined ALU and shifter for high speed bit manipulation

• Specific memory access instructions with powerful auto-indexing addressing modes.

– 32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4.

– Flexible multiple register load and store instructions

Processor Modes

* The ARM has six operating modes:

• User (unprivileged mode under which most tasks run)

• FIQ (entered when a high priority (fast) interrupt is raised)

• IRQ (entered when a low priority (normal) interrupt is raised)

• Supervisor (entered on reset and when a Software Interrupt instruction is executed)

• Abort (used to handle memory access violations)

• Undef (used to handle undefined instructions)

* ARM Architecture Version 4 adds a seventh mode:

• System (privileged mode using the same registers as user mode)

The Registers

* ARM has 37 registers in total, all of which are 32-bits long.

• 1 dedicated program counter

• 1 dedicated current program status register

• 5 dedicated saved program status registers

• 30 general purpose registers

* However these are arranged into several banks, with the accessible bank being governed by the processor mode. Each mode can access

• a particular set of r0-r12 registers

• a particular r13 (the stack pointer) and r14 (link register)

• r15 (the program counter)

• cpsr (the current program status register)

and privileged modes can also access

• a particular spsr (saved program status register)

Real Stuff : x86 instructions

x86 is a series of computer microprocessor instruction set architectures based on the Intel 8086 CPU. The 8086 was introduced during 1978 as a fully 16-bit extension of Intel's 8-bit based 8080 microprocessor and also introduced memory segmentation to overcome the 16-bit addressing barrier of such designs. The term x86 derived from the fact that early successors to the 8086 also had names ending with "86". Many additions and extensions have been added to the x86 instruction set over the years, almost consistently with full backward compatibility.

The term is not synonymous with IBM PC compatibility as this implies a multitude of other computer hardware; embedded systems as well as general-purpose computers used x86 chips before the PC-compatible market started, some of them before the IBM PC itself.

As the term became common after the introduction of the 80386, it usually implies binary compatibility with the 32-bit instruction set of the 80386. This may sometimes be emphasized as x86-32 or x32 to distinguish it either from the original 16-bit "x86-16" or from the 64-bit x86-64. Although most x86 processors used in new personal computers and servers have 64-bit capabilities, to avoid compatibility problems with older computers or systems, the term x86-64 (or x64) is often used to denote 64-bit software, with the term x86 implying only 32-bit.

Although the 8086 was primarily developed for embedded systems and small single-user computers, largely as a response to the successful 8080-compatible Zilog Z80, the x86 line soon grew in features and processing power. Today, x86 is ubiquitous in both stationary and portable personal computers and has replaced midrange computers and Reduced instruction set computer (RISC) based processors in a majority of servers and workstations as well. A large amount of software, including operating systems (OSs) such as DOS, Windows, Linux, BSD, Solaris, and Mac OS X functions with x86-based hardware.

Modern x86 is relatively uncommon in embedded systems, however, and small low power applications (using tiny batteries) as well as low-cost microprocessor markets, such as home appliances and toys, lack any significant x86 presence. Simple 8-bit and 16-bit based architectures are common here, although the x86-compatible VIA C7, VIA Nano, AMD's Geode, Athlon Neo, and Intel Atom are examples of 32- and 64-bit designs used in some relatively low power and low cost segments.

There have been several attempts, including by Intel itself, to end the market dominance of the "inelegant" x86 architecture designed directly from the first simple 8-bit microprocessors. Examples of this are the iAPX 432 (alias Intel 8800), the Intel 960, Intel 860, and the Intel/Hewlett-Packard Itanium architecture. However, the continuous refinement of x86 microarchitectures, circuitry, and semiconductor manufacturing would make it hard to replace x86 in many segments. AMD's 64 bit extension of x86 (which Intel eventually responded to with a compatible design) and the scalability of x86 chips such as the eight-core Intel Xeon and 12-core AMD Opteron is underlining x86 as an example of how continuous refinement of established industry standards can resist the competition from completely new architectures.

Designers of instruction sets sometimes provide more powerful operations than those found in ARM and MIPS. The goal is generally to reduce the number of instructions executed by a program. The danger is that this reduction can occur at the cost of simplicity, increasing the time a program takes to execute because the instructions are slower. This slowness may be the result of a slower clock cycle time or of requiring more clock cycles than a simpler sequence.

The path toward operation complexity is thus fraught with peril. To avoid these problems, designers have moved toward simpler instructions.

Chronology

The table below lists brands of common consumer targeted processors implementing the x86 instruction set, grouped by generations that emphasize important events of x86 history. Note: CPU generations are not strict - each generation is characterized by significantly improved or commercially successful processor microarchitecture designs.

Generation

First introduced

Prominent consumer CPU brands

Linear / physical address space

Notable (new) features

1

1978

Intel 8086, Intel 8088 and clones

16-bit / 20-bit (segmented)

First x86 microprocessors

1982

Intel 80186, Intel 80188 and clones, NEC V20/V30

Hardware for fast address calculations, fast mul/div, etc.

2

Intel 80286 and clones

16-bit (30-bit virtual) / 24-bit (segmented)

MMU, for protected mode and a larger address space.

3 (IA-32)

1985

Intel 80386 and clones, AMD Am386

32-bit (46-bit virtual) / 32-bit

32-bit instruction set, MMU with paging.

4 (FPU)

1989

Intel486 and clones, AMD Am486/Am5x86

RISC-like pipelining, integrated x87 FPU (80-bit), on-chip cache.

4/5

1997

IDT/Centaur-C6, Cyrix III-Samuel, VIA C3-Samuel2 / VIA C3-Ezra (2001), VIA C7 (2005)

In-order, integrated FPU, some models with on-chip L2 cache, MMX, SSE.

5

1993

Pentium, Pentium MMX, Cyrix 5x86, Rise mP6

Superscalar, 64-bit databus, faster FPU, MMX (2× 32-bit).

5/6

1996

AMD K5, Nx586 (1994)

μ-op translation.

6

1995

Pentium Pro, Cyrix 6x86, Cyrix MII, Cyrix III-Joshua (2000)

As above / 36-bit physical (PAE)

μ-op translation, conditional move instructions, Out-of-order, register renaming,speculative execution, PAE (Pentium Pro), in-package L2 cache (Pentium Pro).

1997

AMD K6/-2/3, Pentium II/III

L3-cache support, 3DNow!, SSE (2× 64-bit).

2003

Pentium M, Intel Core (2006)

optimized for low power.

7

1999

Athlon, Athlon XP

Superscalar FPU, wide design (up to three x86 instr./clock).

2000

Pentium 4

deeply pipelined, high frequency, SSE2, hyper-threading.

7/8

2000

Transmeta Crusoe, Efficeon

VLIW design with x86 emulator, on-die memory controller.

2004

Pentium 4 Prescott

64-bit / 40-bit physical in first AMD implementation

Very deeply pipelined, very high frequency, SSE3, 64-bit capability (integer CPU) is available only in LGA 775 sockets.

2006

Intel Core 2

64-bit (integer CPU), low power, multi-core, lower clock frequency, SSE4 (Penryn).

2008

VIA Nano

Out-of-order, superscalar, 64-bit (integer CPU), hardware-based encryption, very low power, adaptive power management.

8 (x86-64)

2003

Athlon 64, Opteron

x86-64 instruction set (CPU main integer core), on-die memory controller, hypertransport.

8/9

2007

AMD Phenom

As above / 48-bit physical for AMD Phenom

Monolithic quad-core, SSE4a, HyperTransport 3 or QuickPath, native memory controller, on-die L3 cache, modular.

2008

Intel Core i3/i5/i7, AMD Phenom II

Intel Atom

In-order but highly pipelined, very-low-power, on some models: 64-bit (integer CPU), on-die GPU.

2011

AMD Bobcat, Llano

Out-of-order, 64-bit (integer CPU), on-die GPU, low power (Bobcat).

9 (GPU)

2011

Intel Sandy Bridge/Ivy Bridge, AMD Bulldozer and Trinity

SSE5/AVX (4× 64-bit), highly modular design, integrated on-die GPU.

x86 Integer Operations

The 8086 provides support for both 8-bit (byte) and 16-bit (word) data types. The 80386 adds 32-bit addresses and data (double words) in the x86. (AMD64 adds 6-bit addresses and data, called quad words) The data type distinctions apply to register operations as well as memory accesses.

x86 Instruction Encoding

The encoding of instructions in the 80386 is complex, with many different instruction formats. Instructions for the 80386 may vary from 1 byte, when they are no operands, up to 15 bytes.

The opcode byte usually contains a it saying whether the operand is 8 bits or 32 bits. For some instructions, the opcode may include the addressing mode and the register; this is true in many instructions that have the form “register = register op immediate.” Other instructions use a “postbyte” or extra opcode byte, labelled “mod, reg, r/m,” which contains the addressing mode information. This postbyte is used for many of the instructions that address memory. The base plus scaled index mode uses a second postbyte, labelled “sc, index, base.”

Memory addressing modes

n Address in register

n Address = R_base + displacement

n Address = R_base + 2^scale × R_index (scale = 0, 1, 2, or 3)

Address = R_base + 2^scale × R_index + displacement

x86 Assembly Guide

This guide describes the basics of 32-bit x86 assembly language programming, covering a small but useful subset of the available instructions and assembler directives. There are several different assembly languages for generating x86 machine code. The one we will use in CS216 is the Microsoft Macro Assembler (MASM) assembler. MASM uses the standard Intel syntax for writing x86 assembly code.

The full x86 instruction set is large and complex (Intel's x86 instruction set manuals comprise over 2900 pages), and we do not cover it all in this guide. For example, there is a 16-bit subset of the x86 instruction set. Using the 16-bit programming model can be quite complex. It has a segmented memory model, more restrictions on register usage, and so on. In this guide, we will limit our attention to more modern aspects of x86 programming, and delve into the instruction set only in enough detail to get a basic feel for x86 programming.

Fallacies

1. More powerful instructions mean higher performance.

Ø Part of the power of the Intel x86 is the prefixes that can modify the execution of the following instruction. 1 prefix can repeat the following instruction until a counter counts down to 0.

Ø It is to load the data into the registers and then store the registers back to memory.

Ø With the code replicated to reduce loop overhead, copies at about 1.5 times faster.

Ø Uses the larger floating-point registers copies at about 2.0 times faster than the complex move instruction.

2. Write in assembly language to obtain the higher performance.

Ø The increasing sophistication of compilers means the gap between compiled code and code produced by hand is closing fast.

Ø But modern compilers are better at dealing with modern processors.

Ø More lines of code = more errors and less productivity.

3. The importance of commercial binary compatibility means successful instruction sets don’t change.

Pitfalls

1. Forgetting that sequential word addresses in machines with byte addressing do not differ by one.

Ø Many an assembly language programmer has toiled over errors made by assuming that the address of the next word can be found by incrementing the address in a register by one.

2. Using a pointer to an automatic variable outside its defining procedure.

Ø A common mistake in dealing with pointers is to pass a result from a procedure that includes a pointer to an array that is local to that procedure.

Ø Eg : the memory that contains the local array will be reused as soon as the procedure returns. Pointers to automatic variables can lead to chaos.

Written by,

Goh Hooi Kuan

B031210043

ss_ladies

Tuesday, 16 October 2012

Language of Computers - Goh Hooi Kuan

x86 Assembly Guide

No comments:

Post a Comment

Generation	First introduced	Prominent consumer CPU brands	Linear / physical address space	Notable (new) features
1	1978	Intel 8086, Intel 8088 and clones	16-bit / 20-bit (segmented)	First x86 microprocessors
1	1982	Intel 80186, Intel 80188 and clones, NEC V20/V30	16-bit / 20-bit (segmented)	Hardware for fast address calculations, fast mul/div, etc.
2	1982	Intel 80286 and clones	16-bit (30-bit virtual) / 24-bit (segmented)	MMU, for protected mode and a larger address space.
3 (IA-32)	1985	Intel 80386 and clones, AMD Am386	32-bit (46-bit virtual) / 32-bit	32-bit instruction set, MMU with paging.
4 (FPU)	1989	Intel486 and clones, AMD Am486/Am5x86		RISC-like pipelining, integrated x87 FPU (80-bit), on-chip cache.
4/5	1997	IDT/Centaur-C6, Cyrix III-Samuel, VIA C3-Samuel2 / VIA C3-Ezra (2001), VIA C7 (2005)		In-order, integrated FPU, some models with on-chip L2 cache, MMX, SSE.
5	1993	Pentium, Pentium MMX, Cyrix 5x86, Rise mP6		Superscalar, 64-bit databus, faster FPU, MMX (2× 32-bit).
5/6	1996	AMD K5, Nx586 (1994)		μ-op translation.
6	1995	Pentium Pro, Cyrix 6x86, Cyrix MII, Cyrix III-Joshua (2000)	As above / 36-bit physical (PAE)	μ-op translation, conditional move instructions, Out-of-order, register renaming,speculative execution, PAE (Pentium Pro), in-package L2 cache (Pentium Pro).
	1997	AMD K6/-2/3, Pentium II/III		L3-cache support, 3DNow!, SSE (2× 64-bit).
	2003	Pentium M, Intel Core (2006)		optimized for low power.
7	1999	Athlon, Athlon XP		Superscalar FPU, wide design (up to three x86 instr./clock).
7	2000	Pentium 4		deeply pipelined, high frequency, SSE2, hyper-threading.
7/8	2000	Transmeta Crusoe, Efficeon		VLIW design with x86 emulator, on-die memory controller.
	2004	Pentium 4 Prescott	64-bit / 40-bit physical in first AMD implementation	Very deeply pipelined, very high frequency, SSE3, 64-bit capability (integer CPU) is available only in LGA 775 sockets.
	2006	Intel Core 2		64-bit (integer CPU), low power, multi-core, lower clock frequency, SSE4 (Penryn).
	2008	VIA Nano		Out-of-order, superscalar, 64-bit (integer CPU), hardware-based encryption, very low power, adaptive power management.
8 (x86-64)	2003	Athlon 64, Opteron		x86-64 instruction set (CPU main integer core), on-die memory controller, hypertransport.
8/9	2007	AMD Phenom	As above / 48-bit physical for AMD Phenom	Monolithic quad-core, SSE4a, HyperTransport 3 or QuickPath, native memory controller, on-die L3 cache, modular.
	2008	Intel Core i3/i5/i7, AMD Phenom II
	2008	Intel Atom		In-order but highly pipelined, very-low-power, on some models: 64-bit (integer CPU), on-die GPU.
	2011	AMD Bobcat, Llano		Out-of-order, 64-bit (integer CPU), on-die GPU, low power (Bobcat).
9 (GPU)	2011	Intel Sandy Bridge/Ivy Bridge, AMD Bulldozer and Trinity		SSE5/AVX (4× 64-bit), highly modular design, integrated on-die GPU.