ARM Instruction
ARM (formerly Advanced RISC Machine and Acorn RISC
Machine) is a reduced instruction set computer
(RISC) instruction set architecture (ISA).
The ARM architecture is the most widely used 32-bit instruction
set architecture in numbers produced.
ARM is the
most popular instruction set architecture for embedded devices, with more than
3 billion devices per year using ARM. ARM came out the same year as MIPS and
followed similar philosophies. Figure 2.31 lists the similarities. The
principle difference is that MIPS has more registers and ARM has more
addressing modes.
There is a similar core of
instruction sets for arithmetic-logical and data transfer instructions for MIPS
and ARM, as Figure 2.32 shows.
ARM
cores
Architecture
|
Family
|
ARMv1
|
ARM1
|
ARMv2
|
ARM2, ARM3
|
ARMv3
|
ARM6, ARM7
|
ARMv4
|
StrongARM, ARM7TDMI, ARM9TDMI
|
ARMv5
|
ARM7EJ, ARM9E, ARM10E, XScale
|
ARMv6
|
ARM11, ARM Cortex-M
|
ARMv7
|
ARM Cortex-A, ARM Cortex-M, ARM Cortex-R
|
ARMv8
|
No
cores available yet. Will support 64-bit data and addressing
|
Example
applications of ARM cores
ARM cores are used in a
number of products, particularly various PDAs,smartphones, portable media player, digital
camera, handheld games console and automotive navigation system.
Since 2005, ARM cores to
simulate the human brain because they are very small, very cheap and use little
power.
CPU modes
The ARM architecture specifies
the following CPU modes. At any moment in time, the CPU can be in only one
mode, but it can switch modes due to external events (interrupts) or
programmatically.
User mode - The
only non-privileged mode.
System
mode - The
only privileged mode that is not entered by an exception. It can only be
entered by executing an instruction
that explicitly writes to the mode bits of the CPSR.
Supervisor
(svc) mode - A privileged mode entered
whenever the CPU is reset or when a SWI instruction is executed.
Abort
mode -
A privileged mode that is entered whenever a prefetch abort or data
abort exception occurs.
Undefined
mode -
A privileged mode that is entered whenever an undefined instruction
exception occurs.
Interrupt
mode -
A privileged mode that is entered whenever the processor accepts an IRQ
interrupt.
Fast
Interrupt mode - A privileged mode that is entered whenever
the processor accepts an FIQ interrupt.
Hypervisor
mode -
A hypervisor mode introduced in armv-7a for cortex-A15 processor for
providing hardware virtualization support.
Addressing
Modes
Figure 2.33 shows the data addressing modes
supported by ARM. Unlike MIPS, ARM does not reserve a register to contain 0.
Although MIPS has just 3 simple data addressing modes, ARM has 9, including
fairly complex calculations. For example, ARM has an addressing mode that can
shift 1 register by any amount, add it to the other registers to form the
address, and then update 1 register with this new address.
Compare and Conditional
Branch
MIPS uses the contents of
registers to evaluate conditional branches. ARM uses the traditional 4
condition code bits stored in the program status word: negative, zero, carry,
and overflow. They can be set on any arithmetic or logical instruction; unlike
earlier architectures, this setting is optional on each instruction. An explicit
option leads to fewer problems in a pipelined implementation. ARM uses
conditional branches to test condition codes to determine all possible unsigned
and signed relations.
CMP subtracts 1 operand from the other and the difference sets the
condition codes. Compare negative (CMN) adds 1 operand to the other, and the
sum sets the condition codes. TST performs logical AND on the 2 operands to set
all condition codes but overflow, while TEQ uses exclusive OR to set the first
3 condition codes.
1 unusual feature of ARM is that every instruction has the option of
executing conditionally, depending on the condition codes. Every instruction
starts with a 4-bit field that determines whether it will act as a no operation
instruction (nop) or as a real instruction, depending on the condition codes.
Hence, conditional branches are properly considered as conditionally executing
the unconditional branch instruction. Conditional execution allows avoiding a
branch to jump over a single instruction. It takes less code space and time to
simply conditionally execute 1 instruction.
Figure 2.34 shows the instruction formats for ARM and MIPS. The
principal differences are the 4-bit conditional execution field in every
instruction and the smaller register field, because ARM has half the number of
registers.
Main features of the ARM Instruction Set
*
All instructions are 32 bits long.
*
Most instructions execute in a single cycle.
*
Every instruction can be conditionally executed.
*
A load/store architecture
•
Data processing instructions act only on registers
–
Three operand format
–
Combined ALU and shifter for high speed bit manipulation
•
Specific memory access instructions with powerful auto-indexing addressing
modes.
–
32 bit and 8 bit data types and also 16 bit data types on ARM Architecture v4.
–
Flexible multiple register load and store instructions
Processor
Modes
*
The ARM has six operating modes:
•
User (unprivileged mode under which most tasks run)
•
FIQ (entered when a high priority (fast) interrupt is raised)
•
IRQ (entered when a low priority (normal) interrupt is raised)
•
Supervisor (entered on reset and when a Software Interrupt instruction
is executed)
•
Abort (used to handle memory access violations)
•
Undef (used to handle undefined instructions)
*
ARM Architecture Version 4 adds a seventh mode:
• System (privileged mode using the same
registers as user mode)
The Registers
* ARM has 37 registers in total, all of which are
32-bits long.
•
1 dedicated program counter
•
1 dedicated current program status register
•
5 dedicated saved program status registers
•
30 general purpose registers
*
However these are arranged into several banks, with the accessible bank
being governed by the processor mode. Each mode can access
•
a particular set of r0-r12 registers
•
a particular r13 (the stack pointer) and r14 (link register)
•
r15 (the program counter)
•
cpsr (the current program status register)
and
privileged modes can also access
• a particular spsr (saved program status register)
Real Stuff : x86
instructions
x86 is a series of computer microprocessor instruction
set architectures based
on the Intel 8086 CPU.
The 8086 was introduced during 1978 as a fully 16-bit extension of Intel's
8-bit based 8080 microprocessor
and also introduced memory
segmentation to
overcome the 16-bit addressing barrier of such designs. The term x86 derived
from the fact that early successors to the 8086 also had names ending with
"86". Many additions and extensions have been added to the x86
instruction set over the years, almost consistently with full backward
compatibility.
The term is not synonymous with IBM PC compatibility as this implies a multitude of other computer hardware; embedded systems as well as general-purpose computers
used x86 chips before the PC-compatible market started, some of them before the IBM PC itself.
As the term became common after the introduction of the 80386,
it usually implies binary compatibility with the 32-bit instruction set of the 80386. This may sometimes be
emphasized as x86-32 or x32 to distinguish it either from
the original 16-bit "x86-16" or from the 64-bit x86-64. Although most x86 processors used in new personal computers and servers have 64-bit capabilities, to avoid
compatibility problems with older computers or systems, the term x86-64 (or x64)
is often used to denote 64-bit software, with the term x86 implying only 32-bit.
Although the 8086 was primarily
developed for embedded systems and small single-user computers,
largely as a response to the successful 8080-compatible Zilog Z80, the x86 line soon grew in features and
processing power. Today, x86 is ubiquitous in both stationary and portable
personal computers and has replaced midrange computers and Reduced instruction set computer (RISC)
based processors in a majority of servers and workstations as well. A large amount of software, including operating systems (OSs) such as DOS, Windows, Linux, BSD, Solaris,
and Mac OS X functions with x86-based hardware.
Modern x86 is relatively uncommon in embedded systems, however, and small low power applications (using tiny batteries) as
well as low-cost microprocessor markets, such as home appliances and toys, lack any significant x86
presence. Simple 8-bit and 16-bit
based architectures are common here, although the x86-compatible VIA C7, VIA Nano, AMD's Geode, Athlon Neo, and Intel Atom are examples of 32- and 64-bit designs
used in some relatively low power and low cost segments.
There have been several attempts,
including by Intel itself, to end the market dominance of the
"inelegant" x86 architecture designed directly from the first simple
8-bit microprocessors. Examples of this are the iAPX 432 (alias Intel 8800), the Intel 960, Intel 860, and the Intel/Hewlett-Packard Itanium architecture. However, the
continuous refinement of x86 microarchitectures, circuitry,
and semiconductor
manufacturing would
make it hard to replace x86 in many segments. AMD's 64 bit extension of x86
(which Intel eventually responded to with a compatible design) and the scalability of x86 chips such
as the eight-core Intel Xeon and 12-core AMD Opteron is
underlining x86 as an example of how continuous refinement of established
industry standards can resist the competition from completely new
architectures.
Designers of instruction sets
sometimes provide more powerful operations than those found in ARM and MIPS.
The goal is generally to reduce the number of instructions executed by a
program. The danger is that this reduction can occur at the cost of simplicity,
increasing the time a program takes to execute because the instructions are
slower. This slowness may be the result of a slower clock cycle time or of
requiring more clock cycles than a simpler sequence.
The path toward operation complexity
is thus fraught with peril. To avoid these problems, designers have moved
toward simpler instructions.
Chronology
The table below lists brands of common consumer targeted processors
implementing the x86 instruction
set, grouped by generations that emphasize important events of x86
history. Note: CPU generations are not strict - each generation is
characterized by significantly improved or commercially successful processor microarchitecture designs.
Generation
|
First introduced
|
Prominent consumer CPU
brands
|
Linear / physical
address space
|
Notable (new) features
|
1
|
1978
|
16-bit / 20-bit (segmented)
|
First x86 microprocessors
|
|
1982
|
||||
2
|
16-bit (30-bit virtual) / 24-bit
(segmented)
|
|||
1985
|
32-bit (46-bit virtual) / 32-bit
|
|||
1989
|
||||
4/5
|
1997
|
In-order, integrated FPU, some models with on-chip L2 cache,
MMX, SSE.
|
||
5
|
1993
|
|||
5/6
|
1996
|
μ-op translation.
|
||
6
|
1995
|
μ-op translation, conditional move instructions, Out-of-order, register renaming,speculative execution, PAE (Pentium Pro), in-package L2
cache (Pentium Pro).
|
||
1997
|
||||
2003
|
||||
7
|
1999
|
Superscalar FPU, wide design (up to three x86 instr./clock).
|
||
2000
|
||||
7/8
|
2000
|
|||
2004
|
64-bit / 40-bit physical in first AMD
implementation
|
|||
2006
|
||||
2008
|
Out-of-order, superscalar, 64-bit (integer CPU), hardware-based
encryption, very low power, adaptive power
management.
|
|||
2003
|
||||
8/9
|
2007
|
AMD Phenom
|
As above / 48-bit physical for AMD
Phenom
|
Monolithic quad-core, SSE4a, HyperTransport 3 or QuickPath, native memory controller,
on-die L3 cache, modular.
|
2008
|
||||
In-order but highly pipelined, very-low-power, on some models:
64-bit (integer CPU), on-die GPU.
|
||||
2011
|
Out-of-order, 64-bit (integer CPU), on-die GPU, low power (Bobcat).
|
|||
2011
|
x86 Integer
Operations
The 8086 provides support for both 8-bit (byte) and
16-bit (word) data types. The 80386 adds 32-bit addresses and data (double
words) in the x86. (AMD64 adds 6-bit addresses and data, called quad words) The
data type distinctions apply to register operations as well as memory accesses.
x86 Instruction Encoding
The
encoding of instructions in the 80386 is complex, with many different
instruction formats. Instructions for the 80386 may vary from 1 byte, when they
are no operands, up to 15 bytes.
The opcode
byte usually contains a it saying whether the operand is 8 bits or 32 bits. For
some instructions, the opcode may include the addressing mode and the register;
this is true in many instructions that have the form “register = register op
immediate.” Other instructions use a “postbyte” or extra opcode byte, labelled
“mod, reg, r/m,” which contains the addressing mode information. This postbyte
is used for many of the instructions that address memory. The base plus scaled
index mode uses a second postbyte, labelled “sc, index, base.”
Memory
addressing modes
n Address in register
n Address = Rbase +
displacement
n Address = Rbase + 2scale
× Rindex (scale = 0, 1, 2, or 3)
Address = Rbase + 2scale × Rindex
+ displacement
x86
Assembly Guide
This guide describes the basics
of 32-bit x86 assembly language programming, covering a small but useful subset
of the available instructions and assembler directives. There are several
different assembly languages for generating x86 machine code. The one we will
use in CS216 is the Microsoft Macro Assembler (MASM) assembler. MASM uses the
standard Intel syntax for writing x86 assembly code.
The full x86 instruction
set is large and complex (Intel's x86 instruction set manuals comprise over
2900 pages), and we do not cover it all in this guide. For example, there is a
16-bit subset of the x86 instruction set. Using the 16-bit programming model
can be quite complex. It has a segmented memory model, more restrictions on
register usage, and so on. In this guide, we will limit our attention to more
modern aspects of x86 programming, and delve into the instruction set only in
enough detail to get a basic feel for x86 programming.
Fallacies
1. More
powerful instructions mean higher performance.
Ø Part
of the power of the Intel x86 is the prefixes that can modify the execution of
the following instruction. 1 prefix can repeat the following instruction until
a counter counts down to 0.
Ø It
is to load the data into the registers and then store the registers back to
memory.
Ø With
the code replicated to reduce loop overhead, copies at about 1.5 times faster.
Ø Uses
the larger floating-point registers copies at about 2.0 times faster than the
complex move instruction.
2. Write
in assembly language to obtain the higher performance.
Ø The
increasing sophistication of compilers means the gap between compiled code and
code produced by hand is closing fast.
Ø But
modern compilers are better at dealing with modern processors.
Ø More
lines of code = more errors and less productivity.
3. The
importance of commercial binary compatibility means successful instruction sets
don’t change.
Pitfalls
1. Forgetting
that sequential word addresses in machines with byte addressing do not differ
by one.
Ø Many
an assembly language programmer has toiled over errors made by assuming that
the address of the next word can be found by incrementing the address in a register
by one.
2. Using
a pointer to an automatic variable outside its defining procedure.
Ø A
common mistake in dealing with pointers is to pass a result from a procedure
that includes a pointer to an array that is local to that procedure.
Ø Eg
: the memory that contains the local array will be reused as soon as the
procedure returns. Pointers to automatic variables can lead to chaos.
Written by,
Goh Hooi Kuan
B031210043
No comments:
Post a Comment