Objectives

What are we going to do here?

get a grounding on assembly language

explore some assembly tools

examine how to use this knowledge

Assembly language(s)

When in Rome... speak Roman

Assembly

It doesn't get anymore "closer to the metal" than this

assemblers are generally entirely CPU-specific

assemblers give you complete control over code

manual memory management

manual stack management

manual everything!

assemblers will often provide "macros"

... to avoid too much pain

... but their use is optional

Assembly

Usage model

create assembler source file(s)

assembler transforms/"compiles" source into "object" files

these are then "linked" with libraries to form a final executable

note that this will depend a great deal on details

toolchain, platform, and so on

Assembler Basics

Common elements to all assembly

Basics

Core architecture/organization

CPUs are made up of registers

part of the process space is set up for a stack

this is just memory, but we treat it differently

the rest of the process space is "heap" space

OS may not allocate all of the process space at once

assembly instructions manipulate registers and memory

either by address, or indirectly (through addresses)

Basics

Source layout

directives will control overall assembler behavior

entrypoint definition, references to macros, etc

"code" sections

define procedures (usually by name)

"data" sections

define variable storage

define constants for easy reference

Basics

Instructions

assembly instructions fall into a variety of categories

math operations

memory manipulation

control transfer

assembly instructions take a common form

"opcode"

"opcode operand"

"opcode operand1,operand2"

Basics

Variables

variables and constants are "just" labels over memory locations

sizes (and signed-ness) are always explicitly stated

byte (8), word (16), double-word (32), quad-word (64), double-quad-word (128)

... depending on the CPU

x86 Overview

Basic breakdown of 80x86 chips

Overview

Welcome to the 80x86 line

best-selling line of CPU, ever

8-bit through 64-bit generations

commonly exemplified as a CISC architecture

four general-purpose registers (A, B, C, D)

several specific-purpose registers (SI, DI, SP, BP, IP, flags)

Overview

8-bit to 64-bit legacy

registers are accessible as 8-, 16-, 32- or 64-bit

AH, AL: "A"-high, "A"-low (8-bit)

AX: "A"-word (16-bit)

EAX: "A"-double-word (32-bit)

usage will depend on circumstance/context

Overview

General registers and their common use

EAX: "accumulator", arithmetic and logic

EBX: arrays

ECX: loops

EDX: arithmetic

Overview

Segment registers

CS: Code Segment

DS: Data Segment

ES: Extra Segment

FS: Extra Segment, next

GS: Extra Segment, next next

Overview

Specific-purpose registers

ESI: Source Index (strings, arrays)

EDI: Destination Index (strings, arrays)

ESP: Stack Pointer (top-of-stack)

EBP: Base Pointer (stack base)

EIP: Instruction Pointer

Overview

Flag register (EFLAGS)

CF: Carry flag

PF: Parity flag

AF: Auxiliary Carry flag

ZF: Zero flag

SF: Sign flag

OF: Overflow flag

Memory Addressing

Knowing Where Things Are

Memory Addressing

Locating instructions and data in memory

this is the real skill in assembly

the Intel chips have a long history

some of it simple

some of it really not simple

and both are for the same reason: backwards compatibility

Memory Addressing

Three major meory models in x86- in order of invention/oldest first

real mode flat model

hello, DOS

real mode segmented model

the interim state (and where most of the craziness lies)

protected mode flat model

only available on 80386+ CPUs

prefer flat models if you value your sanity

Real Mode Flat Model

Back in the beginning...

Real Mode Flat Model

In 1974, Intel introduced the 8080

1 MHz, 8-bit CPU; 16-bit address lines

giving it 64k (65,536) addressable locations

each of the 16-bit addressable locatations held a byte

Real Mode Flat Model

OS of the day was CP/M-80

OS code lived at the top of memory

where the "top" was the actual, installed amount of memory

transient programs (your code) always lived at bottom of memory

execution always started at address 0100h

the first 256 bytes, then were a Program Segment Prefix (PSP)

Real Mode Flat Model

Intel set up the 8086 to mimic the 8080

basically to make porting CP/M-80 apps easier

take the same 64k segment of memory, and start executing it

thus was born the Segment Register

basically memory pointers that hold the start of the 64k segment

Real Mode Flat Model

In this model, everything must fit within 64k

0000h - 0100h: PSP, nothing goes here

0100h upwards: your program code

the IP points to somewhere in here

somewhere beyond your program code: you program data

FFFFh downwards: your program stack

stack always grows downwards, to minimize overwriting program data

Real Mode Segmented Model:

Getting 4G addresses out of 20-bit CPUs

Real Mode Segmented Model

Recall that 8080/8086 likes to think in 64k chunks

Recall that we had CPUs with a lot more than 64k RAM

early x86s had 1mb, for example

somehow we have to able to use that

but without abandoning the real mode flat model

because backwards compatibility!

Real Mode Segmented Model

Examining a 1mb address space through 16-bit addresses

what if we "pin" a block of 20-bit addressable space?

in other words, treat the block as if it were an 8080 64k block

call these blocks "segments"

in fact, divide all 1mb into 64k possibilities for these 64k blocks

these are at each address divisible by 16 (010h); "paragraph"s

so each 64k segment can start at a paragraph boundary

segment 00001 begins at 00000h, 00002 at 00010h, 00003 at 00020h, etc

but keep in mind, segments don't have to be 64k in size

they aren't allocated memory, just addresses

Real Mode Segmented Model

So how do we put 20-bit addresses in 16-bit registers? We don't

you can kinda see where this is going:

we put a 20-bit address into two 16-bit registers

one is the segement number

other is the 16-bit offset from that segment starting point

and if you're clever, you notice...

one physical address can be addressed in multiple ways!

Real Mode Segmented Model

"This seems like a royal pain"

Yup. It was.

but it was the best we had for a few years

until the 80386, and 32-bit operating systems, reached ubiquity

Real Mode Segmented Model

Segment registers

16-bits in size, regardless of CPU

even on 32-bit CPUs, segment registers are 16-bits

CS: Code Segment

DS: Data Segment

SS: Stack Segment

ES: Extra Segment

FS, GS: more extra segments; 80386+ only

Real Mode Segmented Model

Addresses in segmented model

requires a segment register and offset pair

set off by a colon ":"

examples

SS:SP stack segment, offset stored in SP

SS:BP stack segment, offset stored in BP

ES:DI extra segment, offset stored in DI

DS:SI data segment, offset stored in SI

CS:BX code segment, offset stored in BX

Real Mode Segmented Model

Program layout

this is where things get even more fun:

does your program require multiple segments?

multiple code segments (more than 64k code)

multiple data segments (more than 64k data)

basic principles still hold

code segments "lower" in memory

data segments "higher" than code

stack segment near the top and grows down

Real Mode Segmented Model

Segment fun: Some rules

data segments can be loaded into DS, ES, FS, GS

only one code segment register (CS)

only one code segment allowed in use at a time!

recall, segments are always max 64k in size

only one stack segment register (SS)

but we usually only ever use one stack segment

Real Mode Segmented Model

Segment fun: How do we change CS?

you never do; the CPU does

Real Mode Segmented Model

Segment fun: How do we jump outside of a CS, then?

"long" jumps; these specify a new CS and CPU changes it

Protected Mode Flat Model

Where we get to keep things simple

Protected Mode Flat Model

Protected mode means you can't just write anywhere

real modes mean entire address space is available

and if that address space includes the operating system...

... you could really screw some things up

... not to mention the security risks!

Protected Mode Flat Model

Flat model means we don't segment the process

but we still have segment registers; they just never change

you can't change them anyway--protected mode!

segments are just really, really big (32-bit; 4gb)

Protected Mode Flat Model

Memory/process layout

4gb in size

specifics are OS-dependent

some core rules still hold:

Stack starts at "top" and grows "downward"

Code sits near "bottom"

Data sits on "top" of "code"

x86 Instruction Set

Broad classification of the instructions

x86 Instructions

x86 instructions fall into a broad set

Data transfer

Arithmetic (binary, decimal)

Logical (AND, OR, XOR, NOT)

Control transfer (jumps, call, enter/leave, etc)

String (move, compare, scan, etc)

x86 Instructions

x86 instructions fall into a broad set

Bit/byte manipulation

I/O (moving data from processor to I/O port)

Flag control

Segment register manipulation

Miscellaneous instructions

random-number generation

CPUID

x86 Instructions

Most x86 also come with additional instruction sets

floating-point unit (FPU)

multimedia extensions (MMX)

SSE, SSE2, SSE3, SSE4

extensions of the SIMD execution model introduced by MMX

Advanced Vector instructions (AVX)

these are all generally not necessary to know

... until you need to know them, which is why references are good

x86 Instructions

Data transfer instructions

these copy data from and to a variety of places

memory-to-register

immediate=to-register

and so on

note that many of these are counterintuitive

there is no memory-to-memory, for example

x86 Instructions

Data transfer instructions

MOV: move (actually a copy but whatever)

MOVSX (move-and-sign-extend), MOVZX (move-and-zero-extend)

CMOVxx: conditional move

XCHG: exchange values (swap)

CMPXCHG8B: Compare and exchange 8-bytes

PUSH/POP: push/pop onto/from stack

PUSHA/PUSHAD/POPA/POPAD: push/pop general-purpose registers

x86 Instructions

Aritmetic instructions

ADD, ADC, ADCX, ADOX

Add, add-with-carry, uint add-with-carry, uint add-with-overflow

SUB, SBB

subtract, subtract-with-borrow

IMUL, MUL: signed multiply, unsigned multiply

IDIV, DIV: signed divide, unsigned divide

INC, DEC: increment, decrement

NEG: negate

CMP: compare

x86 Instructions

Logic instructions

AND, OR, XOR, NOT

Shift/rotate instructions

shifts lose bits "off the edges"; rotates do not

SHR, SHL: shift left, right

SAR, SAL: shift aritmetic left, right

SHLD, SHRD: shift double left, right

ROL, ROR: rotate left, right

RCL, RCR: rotate through carry left, right

x86 Instructions

Control flow instructions

JMP: unconditional jump to target address

J???: jump-if-(condition)

E (equal), Z (zero), A (above), B (below), G (greater), L (lesser), C (carry), O (overflow), S (sign/negative), P (parity-Odd/parity-Even)

plus all N? (not-?) variants

LOOP: loop with ECX counter

CALL: call procedure

RET: return

ENTER / EXIT: high-level procedure entry / exit

x86 Instructions

String instructions

MOVS?: move (Byte/Word/Doubleword) string

CMPS?: compare (Byte/Word/Doubleword) string

SCAS?: scan (Byte/Word/Doubleword) string

LODS?: load (Byte/Word/Doubleword) string

STOS?: store (Byte/Word/Doubleword) string

REP: repeat while ECX not zero

REPE/REPNE/REPZ/REPNZ: repeat while equal/not-equal/zero/not-zero

MASM

Microsoft Macro Assembler

MASM

Microsoft Macro Assembler

ML.exe

supports 32- and 64-bit programs

a part of Microsoft's build chain since DOS

often paired with CodeView.exe (or debug.com)

Thanks to Microsoft's and Intel's longevity, a standard

but quirky as hell

MASM

General format

"dotted"-commands are directives

these tell the assembler to assume certain things or behave certain ways

Intel instruction format:

MNEMONIC

MNEMONIC OPERAND

MNEMONIC DESTINATION, SOURCE

memory access syntax

ebx: access the contents of the address contained in ebx

esi - 4: access the contents of the address in (esi minus 4 bytes)

_var$ebp: access the contents of _var$ based on the address in ebp-- I have only seen this in VisualC++-generated asm files

MASM

Directives

processor type to assume

.386, .486, .586, .686; "P" variants include privileged instructions

.MMX, .XMM: enables use of MMX or SIMD streaming instructions

.MODEL: define the memory model to use

only used for 16- or 32-bit assembler (not 64)

flat (32)

tiny/small/compact/medium/large/huge/flat (16)

language type: (32) C, STDCALL; (16) C, BASIC, FORTRAN, PASCAL, SYSCALL, STDCALL

.CODE (defines a code segment), .DATA (defines a data segment _DATA), .DATA? (defines an initialized data segment _BSS), .STACK (defines size of the stack)

PROC / ENDP: define a procedure block

SEGMENT: defines a segment in the file by the given name

MASM

Directives

data-definition

DB (byte), DW (word), DD (dword), DQ (quadword), DT (ten bytes)

external dependencies

PUBLIC: this symbol should be made public for other modules to consume

EXTRN / EXTERN: declare existence of symbol outside of this file

INCLUDE: load/parse filename given

INCLUDELIB: link with library given

ALIAS: create alternate name for external function

MASM

Directives: high-level

IFxxx: conditionally test various elements

FOR: standard for-style loop

INVOKE: invoke a given procedure, passing arguments

MACRO: creates a macro, with parameters, that can be used elsewhere

EQU: creates a symbol that equates to a value

STRUC, STRUCT: create a structure with defined field names

UNION: create a C-style union structure of one or more data types

GAS

GNU Assembler

gas: installed as part of GNU toolchain

as: installed as part of other *nix toolchains

highly popular due to GCC chain popularity

accepts either AT&T syntax or Intel (MASM) syntax

GAS

General format

instructions

opcode

opcode operand

opcode source, dest

all register names used as operands are preceded by %

constants preceded by $

GAS

General format

operation suffixes indicate size

"l" long (32 bits)

"w" word (16 bits)

"b" byte (8 bits)

memory address syntax

(%ecx): get the contents of the address stored in ecx

-4(%ebp): get the contents of the address 4 bytes before ebp

(%esi,%ebx,4): address ESI + 4*EBX

GAS

General format

.data directive: static data region (global variables)

.byte, .short, .long, .zero, .string

variables can be accessed via offsets: var(,1)

comments are single-line, prefixed by # or multi-line /* */ pairs

Some x86 examples

Let's look at some code in some detail

x86 Examples

Simple addition (MASM)

        .386
        .model flat, c
        .stack 100h

        .data
num1    sdword ?
num2    sdword 10

        .code
main    proc
        mov num1, 5
        mov eax, num1
        add eax, num2
        ret
main    endp
        end

x86 Examples

Hello, world, on Linux with NASM

Assemble with "nasm -f elf -g -F stabs hellonasmlinux.asm"

Link with "ld -o helloworld hellonasmlinux.o"

SECTION .data   ; initialized data
Msg: db "Hello world",10
MsgLen: equ $-Msg
    
SECTION .bss    ; uninitialized data

x86 Examples

Hello, world, on Linux with NASM

SECTION .text   ; code
    
global _start   ; entrypoint definition
    
_start:
    nop             ; required for gdb-friendliness
    mov eax,4       ; sys_write syscall
    mov ebx,1       ; file descriptor 1: stdout
    mov ecx,Msg     ; message offset
    mov eds,MsgLen  ; message length (bytes)
    int 80h         ; make syscall
    
    mov eax,1       ; exit syscall
    mov ebx,0       ; exit code 0
    int 80h         ; make syscall

x86 Examples

Let's try MASM hello world using Win32

Assemble with "ml win32hello.asm /c"

Link with "link win32hello.asm kernel32.lib /subsystem:console /entry:main"

        .386
        .model flat
    
        extern _ExitProcess@4:near
        extern _GetStdHandle@4:near
        extern _WriteConsoleA@20:near
    
        public _main
    
; data declarations
        .data
msg     byte 'Hello, world!', 10
handle  dword ?
written dword ?
    
        .stack

x86 Examples

Let's try MASM hello world using Win32

        .code
_main:
        push -11
        call _GetStdHandle@4
        mov handle, eax
    
        push 0
        push offset written
        push 13
        push offset msg
        push handle
        call _WriteConsoleA@20
    
        push 0
        call _ExitProcess@4
        
        end

x86 Examples