What are we going to do here?
get a grounding on assembly language
explore some assembly tools
examine how to use this knowledge
It doesn't get anymore "closer to the metal" than this
assemblers are generally entirely CPU-specific
assemblers give you complete control over code
manual memory management
manual stack management
manual everything!
assemblers will often provide "macros"
... to avoid too much pain
... but their use is optional
Usage model
create assembler source file(s)
assembler transforms/"compiles" source into "object" files
these are then "linked" with libraries to form a final executable
note that this will depend a great deal on details
toolchain, platform, and so on
Core architecture/organization
CPUs are made up of registers
part of the process space is set up for a stack
this is just memory, but we treat it differently
the rest of the process space is "heap" space
OS may not allocate all of the process space at once
assembly instructions manipulate registers and memory
either by address, or indirectly (through addresses)
Source layout
directives will control overall assembler behavior
entrypoint definition, references to macros, etc
"code" sections
define procedures (usually by name)
"data" sections
define variable storage
define constants for easy reference
Instructions
assembly instructions fall into a variety of categories
register manipulation
math operations
memory manipulation
control transfer
assembly instructions take a common form
"opcode"
"opcode operand"
"opcode operand1,operand2"
Variables
variables and constants are "just" labels over memory locations
sizes (and signed-ness) are always explicitly stated
byte (8), word (16), double-word (32), quad-word (64), double-quad-word (128)
... depending on the CPU
Welcome to the 80x86 line
best-selling line of CPU, ever
8-bit through 64-bit generations
commonly exemplified as a CISC architecture
register-light
four general-purpose registers (A, B, C, D)
several specific-purpose registers (SI, DI, SP, BP, IP, flags)
8-bit to 64-bit legacy
registers are accessible as 8-, 16-, 32- or 64-bit
AH, AL: "A"-high, "A"-low (8-bit)
AX: "A"-word (16-bit)
EAX: "A"-double-word (32-bit)
usage will depend on circumstance/context
General registers and their common use
EAX: "accumulator", arithmetic and logic
EBX: arrays
ECX: loops
EDX: arithmetic
Segment registers
CS: Code Segment
DS: Data Segment
ES: Extra Segment
FS: Extra Segment, next
GS: Extra Segment, next next
Specific-purpose registers
ESI: Source Index (strings, arrays)
EDI: Destination Index (strings, arrays)
ESP: Stack Pointer (top-of-stack)
EBP: Base Pointer (stack base)
EIP: Instruction Pointer
Flag register (EFLAGS)
CF: Carry flag
PF: Parity flag
AF: Auxiliary Carry flag
ZF: Zero flag
SF: Sign flag
OF: Overflow flag
Locating instructions and data in memory
this is the real skill in assembly
the Intel chips have a long history
some of it simple
some of it really not simple
and both are for the same reason: backwards compatibility
Three major meory models in x86- in order of invention/oldest first
real mode flat model
hello, DOS
real mode segmented model
the interim state (and where most of the craziness lies)
protected mode flat model
only available on 80386+ CPUs
prefer flat models if you value your sanity
In 1974, Intel introduced the 8080
1 MHz, 8-bit CPU; 16-bit address lines
giving it 64k (65,536) addressable locations
each of the 16-bit addressable locatations held a byte
OS of the day was CP/M-80
OS code lived at the top of memory
where the "top" was the actual, installed amount of memory
transient programs (your code) always lived at bottom of memory
execution always started at address 0100h
the first 256 bytes, then were a Program Segment Prefix (PSP)
Intel set up the 8086 to mimic the 8080
basically to make porting CP/M-80 apps easier
take the same 64k segment of memory, and start executing it
thus was born the Segment Register
basically memory pointers that hold the start of the 64k segment
In this model, everything must fit within 64k
0000h - 0100h: PSP, nothing goes here
0100h upwards: your program code
the IP points to somewhere in here
somewhere beyond your program code: you program data
FFFFh downwards: your program stack
stack always grows downwards, to minimize overwriting program data
Recall that 8080/8086 likes to think in 64k chunks
Recall that we had CPUs with a lot more than 64k RAM
early x86s had 1mb, for example
somehow we have to able to use that
but without abandoning the real mode flat model
because backwards compatibility!
Examining a 1mb address space through 16-bit addresses
what if we "pin" a block of 20-bit addressable space?
in other words, treat the block as if it were an 8080 64k block
call these blocks "segments"
in fact, divide all 1mb into 64k possibilities for these 64k blocks
these are at each address divisible by 16 (010h); "paragraph"s
so each 64k segment can start at a paragraph boundary
segment 00001 begins at 00000h, 00002 at 00010h, 00003 at 00020h, etc
but keep in mind, segments don't have to be 64k in size
they aren't allocated memory, just addresses
So how do we put 20-bit addresses in 16-bit registers? We don't
you can kinda see where this is going:
we put a 20-bit address into two 16-bit registers
one is the segement number
other is the 16-bit offset from that segment starting point
and if you're clever, you notice...
one physical address can be addressed in multiple ways!
"This seems like a royal pain"
Yup. It was.
but it was the best we had for a few years
until the 80386, and 32-bit operating systems, reached ubiquity
Segment registers
16-bits in size, regardless of CPU
even on 32-bit CPUs, segment registers are 16-bits
CS: Code Segment
DS: Data Segment
SS: Stack Segment
ES: Extra Segment
FS, GS: more extra segments; 80386+ only
Addresses in segmented model
requires a segment register and offset pair
set off by a colon ":"
examples
SS:SP stack segment, offset stored in SP
SS:BP stack segment, offset stored in BP
ES:DI extra segment, offset stored in DI
DS:SI data segment, offset stored in SI
CS:BX code segment, offset stored in BX
Program layout
this is where things get even more fun:
does your program require multiple segments?
multiple code segments (more than 64k code)
multiple data segments (more than 64k data)
basic principles still hold
code segments "lower" in memory
data segments "higher" than code
stack segment near the top and grows down
Segment fun: Some rules
data segments can be loaded into DS, ES, FS, GS
only one code segment register (CS)
only one code segment allowed in use at a time!
recall, segments are always max 64k in size
only one stack segment register (SS)
but we usually only ever use one stack segment
Segment fun: How do we change CS?
you never do; the CPU does
Segment fun: How do we jump outside of a CS, then?
"long" jumps; these specify a new CS and CPU changes it
Protected mode means you can't just write anywhere
real modes mean entire address space is available
and if that address space includes the operating system...
... you could really screw some things up
... not to mention the security risks!
Flat model means we don't segment the process
but we still have segment registers; they just never change
you can't change them anyway--protected mode!
segments are just really, really big (32-bit; 4gb)
Memory/process layout
4gb in size
specifics are OS-dependent
some core rules still hold:
Stack starts at "top" and grows "downward"
Code sits near "bottom"
Data sits on "top" of "code"
x86 instructions fall into a broad set
Data transfer
Arithmetic (binary, decimal)
Logical (AND, OR, XOR, NOT)
Control transfer (jumps, call, enter/leave, etc)
String (move, compare, scan, etc)
x86 instructions fall into a broad set
Bit/byte manipulation
I/O (moving data from processor to I/O port)
Flag control
Segment register manipulation
Miscellaneous instructions
random-number generation
CPUID
Most x86 also come with additional instruction sets
floating-point unit (FPU)
multimedia extensions (MMX)
SSE, SSE2, SSE3, SSE4
extensions of the SIMD execution model introduced by MMX
Advanced Vector instructions (AVX)
these are all generally not necessary to know
... until you need to know them, which is why references are good
Data transfer instructions
these copy data from and to a variety of places
register-to-memory address
memory-to-register
immediate=to-register
and so on
note that many of these are counterintuitive
there is no memory-to-memory, for example
Data transfer instructions
MOV: move (actually a copy but whatever)
MOVSX (move-and-sign-extend), MOVZX (move-and-zero-extend)
CMOVxx: conditional move
XCHG: exchange values (swap)
CMPXCHG8B: Compare and exchange 8-bytes
PUSH/POP: push/pop onto/from stack
PUSHA/PUSHAD/POPA/POPAD: push/pop general-purpose registers
Aritmetic instructions
ADD, ADC, ADCX, ADOX
Add, add-with-carry, uint add-with-carry, uint add-with-overflow
SUB, SBB
subtract, subtract-with-borrow
IMUL, MUL: signed multiply, unsigned multiply
IDIV, DIV: signed divide, unsigned divide
INC, DEC: increment, decrement
NEG: negate
CMP: compare
Logic instructions
AND, OR, XOR, NOT
Shift/rotate instructions
shifts lose bits "off the edges"; rotates do not
SHR, SHL: shift left, right
SAR, SAL: shift aritmetic left, right
SHLD, SHRD: shift double left, right
ROL, ROR: rotate left, right
RCL, RCR: rotate through carry left, right
Control flow instructions
JMP: unconditional jump to target address
J???: jump-if-(condition)
E (equal), Z (zero), A (above), B (below), G (greater), L (lesser), C (carry), O (overflow), S (sign/negative), P (parity-Odd/parity-Even)
plus all N? (not-?) variants
LOOP: loop with ECX counter
CALL: call procedure
RET: return
ENTER / EXIT: high-level procedure entry / exit
String instructions
MOVS?: move (Byte/Word/Doubleword) string
CMPS?: compare (Byte/Word/Doubleword) string
SCAS?: scan (Byte/Word/Doubleword) string
LODS?: load (Byte/Word/Doubleword) string
STOS?: store (Byte/Word/Doubleword) string
REP: repeat while ECX not zero
REPE/REPNE/REPZ/REPNZ: repeat while equal/not-equal/zero/not-zero
Microsoft Macro Assembler
ML.exe
supports 32- and 64-bit programs
a part of Microsoft's build chain since DOS
often paired with CodeView.exe (or debug.com)
Thanks to Microsoft's and Intel's longevity, a standard
but quirky as hell
General format
"dotted"-commands are directives
these tell the assembler to assume certain things or behave certain ways
Intel instruction format:
MNEMONIC
MNEMONIC OPERAND
MNEMONIC DESTINATION, SOURCE
memory access syntax
ebx: access the contents of the address contained in ebx
esi - 4: access the contents of the address in (esi minus 4 bytes)
_var$ebp: access the contents of _var$ based on the address in ebp-- I have only seen this in VisualC++-generated asm files
Directives
processor type to assume
.386, .486, .586, .686; "P" variants include privileged instructions
.MMX, .XMM: enables use of MMX or SIMD streaming instructions
.MODEL: define the memory model to use
only used for 16- or 32-bit assembler (not 64)
flat (32)
tiny/small/compact/medium/large/huge/flat (16)
language type: (32) C, STDCALL; (16) C, BASIC, FORTRAN, PASCAL, SYSCALL, STDCALL
.CODE (defines a code segment), .DATA (defines a data segment _DATA), .DATA? (defines an initialized data segment _BSS), .STACK (defines size of the stack)
PROC / ENDP: define a procedure block
SEGMENT: defines a segment in the file by the given name
Directives
data-definition
DB (byte), DW (word), DD (dword), DQ (quadword), DT (ten bytes)
external dependencies
PUBLIC: this symbol should be made public for other modules to consume
EXTRN / EXTERN: declare existence of symbol outside of this file
INCLUDE: load/parse filename given
INCLUDELIB: link with library given
ALIAS: create alternate name for external function
Directives: high-level
IFxxx: conditionally test various elements
FOR: standard for-style loop
INVOKE: invoke a given procedure, passing arguments
MACRO: creates a macro, with parameters, that can be used elsewhere
EQU: creates a symbol that equates to a value
STRUC, STRUCT: create a structure with defined field names
UNION: create a C-style union structure of one or more data types
GNU Assembler
gas: installed as part of GNU toolchain
as: installed as part of other *nix toolchains
highly popular due to GCC chain popularity
accepts either AT&T syntax or Intel (MASM) syntax
General format
instructions
opcode
opcode operand
opcode source, dest
all register names used as operands are preceded by %
constants preceded by $
General format
operation suffixes indicate size
"l" long (32 bits)
"w" word (16 bits)
"b" byte (8 bits)
memory address syntax
(%ecx): get the contents of the address stored in ecx
-4(%ebp): get the contents of the address 4 bytes before ebp
(%esi,%ebx,4): address ESI + 4*EBX
General format
.data directive: static data region (global variables)
.byte, .short, .long, .zero, .string
variables can be accessed via offsets: var(,1)
comments are single-line, prefixed by # or multi-line /* */ pairs
Simple addition (MASM)
.386 .model flat, c .stack 100h .data num1 sdword ? num2 sdword 10 .code main proc mov num1, 5 mov eax, num1 add eax, num2 ret main endp end
Hello, world, on Linux with NASM
Assemble with "nasm -f elf -g -F stabs hellonasmlinux.asm"
Link with "ld -o helloworld hellonasmlinux.o"
SECTION .data ; initialized data Msg: db "Hello world",10 MsgLen: equ $-Msg SECTION .bss ; uninitialized data
Hello, world, on Linux with NASM
SECTION .text ; code global _start ; entrypoint definition _start: nop ; required for gdb-friendliness mov eax,4 ; sys_write syscall mov ebx,1 ; file descriptor 1: stdout mov ecx,Msg ; message offset mov eds,MsgLen ; message length (bytes) int 80h ; make syscall mov eax,1 ; exit syscall mov ebx,0 ; exit code 0 int 80h ; make syscall
Let's try MASM hello world using Win32
Assemble with "ml win32hello.asm /c"
Link with "link win32hello.asm kernel32.lib /subsystem:console /entry:main"
.386 .model flat extern _ExitProcess@4:near extern _GetStdHandle@4:near extern _WriteConsoleA@20:near public _main ; data declarations .data msg byte 'Hello, world!', 10 handle dword ? written dword ? .stack
Let's try MASM hello world using Win32
.code _main: push -11 call _GetStdHandle@4 mov handle, eax push 0 push offset written push 13 push offset msg push handle call _WriteConsoleA@20 push 0 call _ExitProcess@4 end
More complicated: C code disassembled
Compile with "cl /Fahello.asm hello.c"
#include <stdio.h> int add(int left, int right) { return left + right; } int main(int argc, char* argv) { int x = 1; int y = 2; int z = add(x, y); }
More complicated: C code disassembled
The generated assembler prelude
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.22.27905.0 TITLE D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c .686P .XMM include listing.inc .model flat INCLUDELIB LIBCMT INCLUDELIB OLDNAMES PUBLIC _add PUBLIC _main
More complicated: C code disassembled
The generated assembler for add()
; Function compile flags: /Odtp _TEXT SEGMENT _left$ = 8 ; size = 4 _right$ = 12 ; size = 4 _add PROC ; File D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c ; Line 4 push ebp mov ebp, esp ; Line 5 mov eax, DWORD PTR _left$[ebp] add eax, DWORD PTR _right$[ebp] ; Line 6 pop ebp ret 0 _add ENDP _TEXT ENDS
More complicated: C code disassembled
The generated assembler for main() (part 1)
; Function compile flags: /Odtp _TEXT SEGMENT _z$ = -12 ; size = 4 _x$ = -8 ; size = 4 _y$ = -4 ; size = 4 _argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4 _main PROC ; File D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c ; Line 9 push ebp mov ebp, esp sub esp, 12 ; 0000000cH ; Line 10 mov DWORD PTR _x$[ebp], 1 ; Line 11 mov DWORD PTR _y$[ebp], 2
More complicated: C code disassembled
The generated assembler for main() (part 2)
; Line 12 mov eax, DWORD PTR _y$[ebp] push eax mov ecx, DWORD PTR _x$[ebp] push ecx call _add add esp, 8 mov DWORD PTR _z$[ebp], eax ; Line 13 xor eax, eax mov esp, ebp pop ebp ret 0 _main ENDP _TEXT ENDS
More complicated: C code disassembled
Compile with "cl /Faconstructs.asm constructs.c"
int main(int argc, char* argv) { int result = ifLoop(argc); forLoop(); }
More complicated: C code disassembled
_DATA SEGMENT $SG7450 DB 'Hello, world, this is the %d loop', 0aH, 00H _DATA ENDS ; Function compile flags: /Odtp _TEXT SEGMENT _result$ = -4 ; size = 4 _argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4 _main PROC ; Line 24 push ebp mov ebp, esp push ecx ; Line 25 mov eax, DWORD PTR _argc$[ebp] push eax call _ifLoop add esp, 4 mov DWORD PTR _result$[ebp], eax
More complicated: C code disassembled
; Line 26 call _forLoop ; Line 27 xor eax, eax mov esp, ebp pop ebp ret 0 _main ENDP
More complicated: C code disassembled
int ifLoop(int arg) { if (arg < 5) { return 12; } else { return 17; } }
More complicated: C code disassembled
_TEXT SEGMENT _arg$ = 8 ; size = 4 _ifLoop PROC ; Line 4 push ebp mov ebp, esp ; Line 5 cmp DWORD PTR _arg$[ebp], 5 jge SHORT $LN2@ifLoop ; Line 7 mov eax, 12 ; 0000000cH jmp SHORT $LN1@ifLoop ; Line 8 jmp SHORT $LN1@ifLoop $LN2@ifLoop: ; Line 11 mov eax, 17 ; 00000011H $LN1@ifLoop: ; Line 13 pop ebp ret 0 _ifLoop ENDP _TEXT ENDS
More complicated: C code disassembled
void forLoop() { for (int i=0; i<10; i++) { printf("Hello, world, this is the %d loop\n", i); } }
More complicated: C code disassembled
_TEXT SEGMENT _i$1 = -4 ; size = 4 _forLoop PROC ; Line 16 push ebp mov ebp, esp push ecx
More complicated: C code disassembled
; Line 17 mov DWORD PTR _i$1[ebp], 0 jmp SHORT $LN4@forLoop $LN2@forLoop: mov eax, DWORD PTR _i$1[ebp] add eax, 1 mov DWORD PTR _i$1[ebp], eax $LN4@forLoop: cmp DWORD PTR _i$1[ebp], 10 ; 0000000aH jge SHORT $LN1@forLoop ; Line 19 mov ecx, DWORD PTR _i$1[ebp] push ecx push OFFSET $SG7450 call _printf add esp, 8 ; Line 20 jmp SHORT $LN2@forLoop $LN1@forLoop:
More complicated: C code disassembled
; Line 21 mov esp, ebp pop ebp ret 0 _forLoop ENDP _TEXT ENDS
Welcome to the 64-bit 80x86 line
backwards-compatible to the 16- and 32-bit 80x86
MMX, SSE (SIMD parallelization) extensions
actually introduced by AMD as a pre-emptive strike against Intel
Intel tried to "clean break" away from IA-32 with "Itanium"
Itanium fell flat; industry wanted backwards-compatibility!
Intel grudgingly accepted AMD's standards and x86-64 (or "x64") was born
64 bits == 16 exabytes == a billion gigabytes
supports
real mode (for DOS/Windows compatibility)
protected mode (for IA-32 compatibility)
long mode (true 64-bit mode)
General registers and their common use
RAX: "accumulator", arithmetic and logic
RBX: arrays
RCX: loops
RDX: arithmetic
R8 - R15 / R8D - R15D: additional registers
Specific-purpose registers
RSI: Source Index (strings, arrays)
RDI: Destination Index (strings, arrays)
RSP: Stack Pointer (top-of-stack)
RBP: Base Pointer (stack base)
RIP: Instruction Pointer
RFLAGS: Flags register
Additional registers
MMX registers (MM0 - MM7)
XMM registers (XMM0 - XMM15 and MXCSR)
Control registers (CR0, CR2, CR3, CR4, CR8)
Debug registers (DR0, DR1, DR2, DR3, DR6, DR7)
Wrapping up
nobody expects to write assembly programs
except in very specific/niche situations
being able to read assembler is a huge step up
particularly from high-level language scenarios
knowing assembly language takes all the mystery away
this helps make it easier to reason about problems
Books useful to have/read
"Assembly Language Step-By-Step" 3rd Edition
best description of memory models anywhere
"Linkers and Loaders"
how compiled files (COFF, ELF, PE, etc) look on disk
Tools for assembly programming
Flat Assembler (fasm): https://flatassembler.net
geared specifically at "flat model" assembly
Microsoft Assembler (MASM): ships with Visual Studio
MASM32SDK: http://masm32.com/
designed to be a little friendler than raw MASM by itself
Network Assembler (NASM): Linux-oriented assembler
ships as part of the GCC toolchain
OpenWatcom toolchain: http://www.openwatcom.com/
Disassemblers
most C/C++ compilers can emit an assembly listing
Visual C++: /Fa{filename}
gcc: -fverbose-asm (or) gcc -S
many standalone disassembler tools
... for when you don't have the source
macOS/Xcode: otool {code}.o -tV
gcc: objdump -d {file}
Visual Studio: dumpbin.exe
http://www.heaventools.com/PE_Explorer_disassembler.htm
IDAPro: https://www.hex-rays.com/products/ida/
Who is this guy?
Architect, Engineering Manager/Leader, "force multiplier"
Principal -- Neward & Associates
http://www.newardassociates.com
Educative (http://educative.io) Author
Performance Management for Engineering Managers
Author
Professional F# 2.0 (w/Erickson, et al; Wrox, 2010)
Effective Enterprise Java (Addison-Wesley, 2004)
SSCLI Essentials (w/Stutz, et al; OReilly, 2003)
Server-Based Java Programming (Manning, 2000)