ted.neward@newardassociates.com | Blog: http://blogs.newardassociates.com | Github: tedneward | LinkedIn: tedneward
Code: https://github.com/tedneward/Demo-Assembly
Slides: http://www.newardassociates.com/presentations/BusyDevsGuide/Assembly.html
get a grounding on assembly language
explore some assembly tools
examine how to use this knowledge
It doesn't get anymore "closer to the metal" than this
assemblers are generally entirely CPU-specific
assemblers give you complete control over code
manual memory management
manual stack management
manual everything!
assemblers will often provide "macros"
... to avoid too much pain
... but their use is optional
Usage model
create assembler source file(s)
assembler transforms/"compiles" source into "object" files
these are then "linked" with libraries to form a final executable
note that this will depend a great deal on details
toolchain, platform, and so on
CPUs are made up of registers
part of the process space is set up for a stack
this is just memory, but we treat it differently
the rest of the process space is "heap" space
OS may not allocate all of the process space at once
assembly instructions manipulate registers and memory
either by address, or indirectly (through addresses)
directives will control overall assembler behavior
entrypoint definition, references to macros, etc
"code" sections
define procedures (usually by name)
"data" sections
define variable storage
define constants for easy reference
assembly instructions fall into a variety of categories
register manipulation
math operations
memory manipulation
control transfer
assembly instructions take a common form
"opcode"
"opcode operand"
"opcode operand1,operand2"
variables and constants are "just" labels over memory locations
sizes (and signed-ness) are always explicitly stated
byte (8), word (16), double-word (32), quad-word (64), double-quad-word (128)
... depending on the CPU
Keep in mind some CPUs store "big" numbers differently
little-endian (littlest digits come first)
big-endian (biggest digits come first)
best-selling line of CPU, ever
8-bit through 64-bit generations
commonly exemplified as a CISC architecture
register-light
four general-purpose registers (A, B, C, D)
several specific-purpose registers (SI, DI, SP, BP, IP, flags)
registers are accessible as 8-, 16-, 32- or 64-bit
AH, AL: "A"-high, "A"-low (8-bit)
AX: "A"-word (16-bit)
EAX: "A"-double-word (32-bit)
usage will depend on circumstance/context
EAX: "accumulator", arithmetic and logic
EBX: arrays
ECX: loops
EDX: arithmetic
CS: Code Segment
DS: Data Segment
ES: Extra Segment
FS: Extra Segment, next
GS: Extra Segment, next next
ESI: Source Index (strings, arrays)
EDI: Destination Index (strings, arrays)
ESP: Stack Pointer (top-of-stack)
EBP: Base Pointer (stack base)
EIP: Instruction Pointer
CF: Carry flag
PF: Parity flag
AF: Auxiliary Carry flag
ZF: Zero flag
SF: Sign flag
OF: Overflow flag
this is the real skill in assembly
the Intel chips have a long history
some of it simple
some of it really not simple
and both are for the same reason: backwards compatibility
in order of invention/oldest first
real mode flat model
hello, DOS
real mode segmented model
the interim state (and where most of the craziness lies)
protected mode flat model
only available on 80386+ CPUs
prefer flat models if you value your sanity
1 MHz, 8-bit CPU; 16-bit address lines
giving it 64k (65,536) addressable locations
each of the 16-bit addressable locatations held a byte
OS code lived at the top of memory
where the "top" was the actual, installed amount of memory
transient programs (your code) always lived at bottom of memory
execution always started at address 0100h
the first 256 bytes, then were a Program Segment Prefix (PSP)
basically to make porting CP/M-80 apps easier
take the same 64k segment of memory, and start executing it
thus was born the Segment Register
basically memory pointers that hold the start of the 64k segment
0000h - 0100h: PSP, nothing goes here
0100h upwards: your program code
the IP points to somewhere in here
somewhere beyond your program code: you program data
FFFFh downwards: your program stack
stack always grows downwards, to minimize overwriting program data
early x86s had 1mb, for example
somehow we have to able to use that
but without abandoning the real mode flat model
because backwards compatibility!
what if we "pin" a block of 20-bit addressable space?
in other words, treat the block as if it were an 8080 64k block
call these blocks "segments"
in fact, divide all 1mb into 64k possibilities for these 64k blocks
these are at each address divisible by 16 (010h); "paragraph"s
so each 64k segment can start at a paragraph boundary
segment 00001 begins at 00000h, 00002 at 00010h, 00003 at 00020h, etc
but keep in mind, segments don't have to be 64k in size
they aren't allocated memory, just addresses
you can kinda see where this is going:
we put a 20-bit address into two 16-bit registers
one is the segement number
other is the 16-bit offset from that segment starting point
and if you're clever, you notice...
one physical address can be addressed in multiple ways!
Yup. It was.
but it was the best we had for a few years
until the 80386, and 32-bit operating systems, reached ubiquity
16-bits in size, regardless of CPU
even on 32-bit CPUs, segment registers are 16-bits
CS: Code Segment
DS: Data Segment
SS: Stack Segment
ES: Extra Segment
FS, GS: more extra segments; 80386+ only
requires a segment register and offset pair
set off by a colon ":"
examples
SS:SP stack segment, offset stored in SP
SS:BP stack segment, offset stored in BP
ES:DI extra segment, offset stored in DI
DS:SI data segment, offset stored in SI
CS:BX code segment, offset stored in BX
this is where things get even more fun:
does your program require multiple segments?
multiple code segments (more than 64k code)
multiple data segments (more than 64k data)
basic principles still hold
code segments "lower" in memory
data segments "higher" than code
stack segment near the top and grows down
data segments can be loaded into DS, ES, FS, GS
only one code segment register (CS)
only one code segment allowed in use at a time!
recall, segments are always max 64k in size
only one stack segment register (SS)
but we usually only ever use one stack segment
you never do; the CPU does
"long" jumps; these specify a new CS and CPU changes it
real modes mean entire address space is available
and if that address space includes the operating system...
... you could really screw some things up
... not to mention the security risks!
but we still have segment registers; they just never change
you can't change them anyway--protected mode!
segments are just really, really big (32-bit; 4gb)
4gb in size
specifics are OS-dependent
some core rules still hold:
Stack starts at "top" and grows "downward"
Code sits near "bottom"
Data sits on "top" of "code"
Data transfer
Arithmetic (binary, decimal)
Logical (AND, OR, XOR, NOT)
Control transfer (jumps, call, enter/leave, etc)
String (move, compare, scan, etc)
Bit/byte manipulation
I/O (moving data from processor to I/O port)
Flag control
Segment register manipulation
Miscellaneous instructions
random-number generation
CPUID
floating-point unit (FPU)
multimedia extensions (MMX)
SSE, SSE2, SSE3, SSE4
extensions of the SIMD execution model introduced by MMX
Advanced Vector instructions (AVX)
these are all generally not necessary to know
... until you need to know them, which is why references are good
these copy data from and to a variety of places
register-to-memory address
memory-to-register
immediate=to-register
and so on
note that many of these are counterintuitive
there is no memory-to-memory, for example
MOV: move (actually a copy but whatever)
MOVSX (move-and-sign-extend), MOVZX (move-and-zero-extend)
CMOVxx: conditional move
XCHG: exchange values (swap)
CMPXCHG8B: Compare and exchange 8-bytes
PUSH/POP: push/pop onto/from stack
PUSHA/PUSHAD/POPA/POPAD: push/pop general-purpose registers
ADD, ADC, ADCX, ADOX
Add, add-with-carry, uint add-with-carry, uint add-with-overflow
SUB, SBB
subtract, subtract-with-borrow
IMUL, MUL: signed multiply, unsigned multiply
IDIV, DIV: signed divide, unsigned divide
INC, DEC: increment, decrement
NEG: negate
CMP: compare
AND, OR, XOR, NOT
shifts lose bits "off the edges"; rotates do not
SHR, SHL: shift left, right
SAR, SAL: shift aritmetic left, right
SHLD, SHRD: shift double left, right
ROL, ROR: rotate left, right
RCL, RCR: rotate through carry left, right
JMP: unconditional jump to target address
J???: jump-if-(condition)
E (equal), Z (zero), A (above), B (below), G (greater), L (lesser)
C (carry), O (overflow), S (sign/negative), P (parity-Odd/parity-Even)
plus all N? (not-?) variants
LOOP: loop with ECX counter
CALL: call procedure
RET: return
ENTER / EXIT: high-level procedure entry / exit
MOVS?: move (Byte/Word/Doubleword) string
CMPS?: compare (Byte/Word/Doubleword) string
SCAS?: scan (Byte/Word/Doubleword) string
LODS?: load (Byte/Word/Doubleword) string
STOS?: store (Byte/Word/Doubleword) string
REP: repeat while ECX not zero
REPE/REPNE/REPZ/REPNZ: repeat while equal/not-equal/zero/not-zero
Microsoft Macro Assembler
ML.exe
supports 32- and 64-bit programs
a part of Microsoft's build chain since DOS
often paired with CodeView.exe (or debug.com)
Thanks to Microsoft's and Intel's longevity, a standard
but quirky as hell
General format
"dotted"-commands are directives
these tell the assembler to assume certain things or behave certain ways
Intel instruction format:
MNEMONIC
MNEMONIC OPERAND
MNEMONIC DESTINATION, SOURCE
memory access syntax
ebx: access the contents of the address contained in ebx
esi - 4: access the contents of the address in (esi minus 4 bytes)
_var$ebp: access the contents of _var$ based on the address in ebp-- I have only seen this in VisualC++-generated asm files
Directives
processor type to assume
.386, .486, .586, .686; "P" variants include privileged instructions
.MMX, .XMM: enables use of MMX or SIMD streaming instructions
.MODEL: define the memory model to use
only used for 16- or 32-bit assembler (not 64)
flat (32)
tiny/small/compact/medium/large/huge/flat (16)
language type: (32) C, STDCALL; (16) C, BASIC, FORTRAN, PASCAL, SYSCALL, STDCALL
.CODE (defines a code segment), .DATA (defines a data segment _DATA), .DATA? (defines an initialized data segment _BSS), .STACK (defines size of the stack)
PROC / ENDP: define a procedure block
SEGMENT: defines a segment in the file by the given name
Directives
data-definition
DB (byte), DW (word), DD (dword), DQ (quadword), DT (ten bytes)
external dependencies
PUBLIC: this symbol should be made public for other modules to consume
EXTRN / EXTERN: declare existence of symbol outside of this file
INCLUDE: load/parse filename given
INCLUDELIB: link with library given
ALIAS: create alternate name for external function
Directives: high-level
IFxxx: conditionally test various elements
FOR: standard for-style loop
INVOKE: invoke a given procedure, passing arguments
MACRO: creates a macro, with parameters, that can be used elsewhere
EQU: creates a symbol that equates to a value
STRUC, STRUCT: create a structure with defined field names
UNION: create a C-style union structure of one or more data types
gas: installed as part of GNU toolchain
as: installed as part of other *nix toolchains
highly popular due to GCC chain popularity
accepts either AT&T syntax or Intel (MASM) syntax
instructions
opcode
opcode operand
opcode source, dest
all register names used as operands are preceded by %
constants preceded by $
operation suffixes indicate size
"l" long (32 bits)
"w" word (16 bits)
"b" byte (8 bits)
memory address syntax
(%ecx): get the contents of the address stored in ecx
-4(%ebp): get the contents of the address 4 bytes before ebp
(%esi,%ebx,4): address ESI + 4*EBX
.data directive: static data region (global variables)
.byte, .short, .long, .zero, .string
variables can be accessed via offsets: var(,1)
comments are single-line, prefixed by # or multi-line /* */ pairs
Simple addition
.386
.model flat, c
.stack 100h
.data
num1 sdword ?
num2 sdword 10
.code
main proc
mov num1, 5
mov eax, num1
add eax, num2
ret
main endp
end
Let's try MASM hello world using Win32
Assemble with "ml win32hello.asm /c"
Link with "link win32hello.asm kernel32.lib /subsystem:console /entry:main"
.386
.model flat
extern _ExitProcess@4:near
extern _GetStdHandle@4:near
extern _WriteConsoleA@20:near
public _main
; data declarations
.data
msg byte 'Hello, world!', 10
handle dword ?
written dword ?
.stack
Let's try MASM hello world using Win32
.code
_main:
push -11
call _GetStdHandle@4
mov handle, eax
push 0
push offset written
push 13
push offset msg
push handle
call _WriteConsoleA@20
push 0
call _ExitProcess@4
end
More complicated: C code disassembled
Compile with "cl /Fahello.asm hello.c"
#include <stdio.h>
int add(int left, int right)
{
return left + right;
}
int main(int argc, char* argv)
{
int x = 1;
int y = 2;
int z = add(x, y);
}
More complicated: C code disassembled
The generated assembler prelude
; Listing generated by Microsoft (R) Optimizing Compiler Version 19.22.27905.0
TITLE D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c
.686P
.XMM
include listing.inc
.model flat
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
PUBLIC _add
PUBLIC _main
More complicated: C code disassembled
The generated assembler for add()
; Function compile flags: /Odtp _TEXT SEGMENT _left$ = 8 ; size = 4 _right$ = 12 ; size = 4 _add PROC ; File D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c ; Line 4 push ebp mov ebp, esp ; Line 5 mov eax, DWORD PTR _left$[ebp] add eax, DWORD PTR _right$[ebp] ; Line 6 pop ebp ret 0 _add ENDP _TEXT ENDS
More complicated: C code disassembled
The generated assembler for main() (part 1)
; Function compile flags: /Odtp _TEXT SEGMENT _z$ = -12 ; size = 4 _x$ = -8 ; size = 4 _y$ = -4 ; size = 4 _argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4 _main PROC ; File D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c ; Line 9 push ebp mov ebp, esp sub esp, 12 ; 0000000cH ; Line 10 mov DWORD PTR _x$[ebp], 1 ; Line 11 mov DWORD PTR _y$[ebp], 2
More complicated: C code disassembled
The generated assembler for main() (part 2)
; Line 12 mov eax, DWORD PTR _y$[ebp] push eax mov ecx, DWORD PTR _x$[ebp] push ecx call _add add esp, 8 mov DWORD PTR _z$[ebp], eax ; Line 13 xor eax, eax mov esp, ebp pop ebp ret 0 _main ENDP _TEXT ENDS
More complicated: C code disassembled
Compile with "cl /Faconstructs.asm constructs.c"
int main(int argc, char* argv)
{
int result = ifLoop(argc);
forLoop();
}
More complicated: C code disassembled
_DATA SEGMENT $SG7450 DB 'Hello, world, this is the %d loop', 0aH, 00H _DATA ENDS ; Function compile flags: /Odtp _TEXT SEGMENT _result$ = -4 ; size = 4 _argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4 _main PROC ; Line 24 push ebp mov ebp, esp push ecx ; Line 25 mov eax, DWORD PTR _argc$[ebp] push eax call _ifLoop add esp, 4 mov DWORD PTR _result$[ebp], eax
More complicated: C code disassembled
; Line 26 call _forLoop ; Line 27 xor eax, eax mov esp, ebp pop ebp ret 0 _main ENDP
More complicated: C code disassembled
int ifLoop(int arg)
{
if (arg < 5)
{
return 12;
}
else
{
return 17;
}
}
More complicated: C code disassembled
_TEXT SEGMENT _arg$ = 8 ; size = 4 _ifLoop PROC ; Line 4 push ebp mov ebp, esp ; Line 5 cmp DWORD PTR _arg$[ebp], 5 jge SHORT $LN2@ifLoop ; Line 7 mov eax, 12 ; 0000000cH jmp SHORT $LN1@ifLoop ; Line 8 jmp SHORT $LN1@ifLoop $LN2@ifLoop: ; Line 11 mov eax, 17 ; 00000011H $LN1@ifLoop: ; Line 13 pop ebp ret 0 _ifLoop ENDP _TEXT ENDS
More complicated: C code disassembled
void forLoop()
{
for (int i=0; i<10; i++)
{
printf("Hello, world, this is the %d loop\n", i);
}
}
More complicated: C code disassembled
_TEXT SEGMENT _i$1 = -4 ; size = 4 _forLoop PROC ; Line 16 push ebp mov ebp, esp push ecx
More complicated: C code disassembled
; Line 17 mov DWORD PTR _i$1[ebp], 0 jmp SHORT $LN4@forLoop $LN2@forLoop: mov eax, DWORD PTR _i$1[ebp] add eax, 1 mov DWORD PTR _i$1[ebp], eax $LN4@forLoop: cmp DWORD PTR _i$1[ebp], 10 ; 0000000aH jge SHORT $LN1@forLoop ; Line 19 mov ecx, DWORD PTR _i$1[ebp] push ecx push OFFSET $SG7450 call _printf add esp, 8 ; Line 20 jmp SHORT $LN2@forLoop $LN1@forLoop:
More complicated: C code disassembled
; Line 21 mov esp, ebp pop ebp ret 0 _forLoop ENDP _TEXT ENDS
nobody expects to write assembly programs
except in very specific/niche situations
being able to read assembler is a huge step up
particularly from high-level language scenarios
knowing assembly language takes all the mystery away
this helps make it easier to reason about problems
https://godbolt.org/ - Compiler Explorer
modes for every targeted assembly language imaginable (including JVM, CLR, others)
"Assembly Language Step-By-Step" 3rd Edition
best description of memory models anywhere
"Linkers and Loaders"
how compiled files (COFF, ELF, PE, etc) look on disk
Flat Assembler (fasm): https://flatassembler.net
geared specifically at "flat model" assembly
Microsoft Assembler (MASM): ships with Visual Studio
MASM32SDK: http://masm32.com/
designed to be a little friendler than raw MASM by itself
Network Assembler (NASM): Linux-oriented assembler
ships as part of the GCC toolchain
OpenWatcom toolchain: http://www.openwatcom.com/
most C/C++ compilers can emit an assembly listing
Visual C++: /Fa{filename}
gcc: -fverbose-asm (or) gcc -S
many standalone disassembler tools
... for when you don't have the source
macOS/Xcode: otool {code}.o -tV
gcc: objdump -d {file}
Visual Studio: dumpbin.exe
http://www.heaventools.com/PE_Explorer_disassembler.htm
IDAPro: https://www.hex-rays.com/products/ida/
Intel 64 and IA-32 Architectures Software Developer's Manual
Volumes 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, 4
available as one huge PDF, four PDFs, or ten PDFs
extremely technical and detailed; not for the faint of heart!
https://software.intel.com/articles/intel-sdm
AMD Architecture Programmer's Manual
... because not all x86 chips are actually Intel's
http://developer.amd.com/resources/developer-guides-manuals/
various other websites provide views on this data
http://ref.x86asm.net/
https://www.cs.virginia.edu/~evans/cs216/guides/x86.html
Architect, Engineering Manager/Leader, "force multiplier"
http://www.newardassociates.com
http://blogs.newardassociates.com
Books
Developer Relations Activity Patterns (w/Woodruff, et al; APress, 2026)
Professional F# 2.0 (w/Erickson, et al; Wrox, 2010)
Effective Enterprise Java (Addison-Wesley, 2004)
SSCLI Essentials (w/Stutz, et al; OReilly, 2003)
Server-Based Java Programming (Manning, 2000)