Busy Developer's Guide to Assembly Language

ted@tedneward.com | Blog: http://blogs.tedneward.com | Twitter: tedneward | Github: tedneward | LinkedIn: tedneward

Objectives

What are we going to do here?

Assembly language(s)

When in Rome... speak Roman

Assembly

It doesn't get anymore "closer to the metal" than this

Assembly

Usage model

Assembler Basics

Common elements to all assembly

Basics

Core architecture/organization

Basics

Source layout

Basics

Instructions

Basics

Variables

x86 Overview

Basic breakdown of 80x86 chips

Overview

Welcome to the 80x86 line

Overview

8-bit to 64-bit legacy

Overview

General registers and their common use

Overview

Segment registers

Overview

Specific-purpose registers

Overview

Flag register (EFLAGS)

Memory Addressing

Knowing Where Things Are

Memory Addressing

Locating instructions and data in memory

Memory Addressing

Three major meory models in x86- in order of invention/oldest first

Real Mode Flat Model

Back in the beginning...

Real Mode Flat Model

In 1974, Intel introduced the 8080

Real Mode Flat Model

OS of the day was CP/M-80

Real Mode Flat Model

Intel set up the 8086 to mimic the 8080

Real Mode Flat Model

In this model, everything must fit within 64k

Real Mode Segmented Model:

Getting 4G addresses out of 20-bit CPUs

Real Mode Segmented Model

Recall that 8080/8086 likes to think in 64k chunks

Recall that we had CPUs with a lot more than 64k RAM

Real Mode Segmented Model

Examining a 1mb address space through 16-bit addresses

Real Mode Segmented Model

So how do we put 20-bit addresses in 16-bit registers? We don't

Real Mode Segmented Model

"This seems like a royal pain"

Real Mode Segmented Model

Segment registers

Real Mode Segmented Model

Addresses in segmented model

Real Mode Segmented Model

Program layout

Real Mode Segmented Model

Segment fun: Some rules

Real Mode Segmented Model

Segment fun: How do we change CS?

Real Mode Segmented Model

Segment fun: How do we jump outside of a CS, then?

Protected Mode Flat Model

Where we get to keep things simple

Protected Mode Flat Model

Protected mode means you can't just write anywhere

Protected Mode Flat Model

Flat model means we don't segment the process

Protected Mode Flat Model

Memory/process layout

x86 Instruction Set

Broad classification of the instructions

x86 Instructions

x86 instructions fall into a broad set

x86 Instructions

x86 instructions fall into a broad set

x86 Instructions

Most x86 also come with additional instruction sets

x86 Instructions

Data transfer instructions

x86 Instructions

Data transfer instructions

x86 Instructions

Aritmetic instructions

x86 Instructions

Logic instructions

Shift/rotate instructions

x86 Instructions

Control flow instructions

x86 Instructions

String instructions

MASM

Microsoft Macro Assembler

MASM

Microsoft Macro Assembler

MASM

General format

MASM

Directives

MASM

Directives

MASM

Directives: high-level

GAS

GNU Assembler

GAS

GNU Assembler

GAS

General format

GAS

General format

GAS

General format

Some x86 examples

Let's look at some code in some detail

x86 Examples

Simple addition (MASM)

        .386
        .model flat, c
        .stack 100h

        .data
num1    sdword ?
num2    sdword 10

        .code
main    proc
        mov num1, 5
        mov eax, num1
        add eax, num2
        ret
main    endp
        end

x86 Examples

Hello, world, on Linux with NASM

Assemble with "nasm -f elf -g -F stabs hellonasmlinux.asm"

Link with "ld -o helloworld hellonasmlinux.o"

SECTION .data   ; initialized data
Msg: db "Hello world",10
MsgLen: equ $-Msg
    
SECTION .bss    ; uninitialized data
    

x86 Examples

Hello, world, on Linux with NASM

SECTION .text   ; code
    
global _start   ; entrypoint definition
    
_start:
    nop             ; required for gdb-friendliness
    mov eax,4       ; sys_write syscall
    mov ebx,1       ; file descriptor 1: stdout
    mov ecx,Msg     ; message offset
    mov eds,MsgLen  ; message length (bytes)
    int 80h         ; make syscall
    
    mov eax,1       ; exit syscall
    mov ebx,0       ; exit code 0
    int 80h         ; make syscall

x86 Examples

Let's try MASM hello world using Win32

Assemble with "ml win32hello.asm /c"

Link with "link win32hello.asm kernel32.lib /subsystem:console /entry:main"

        .386
        .model flat
    
        extern _ExitProcess@4:near
        extern _GetStdHandle@4:near
        extern _WriteConsoleA@20:near
    
        public _main
    
; data declarations
        .data
msg     byte 'Hello, world!', 10
handle  dword ?
written dword ?
    
        .stack

x86 Examples

Let's try MASM hello world using Win32

        .code
_main:
        push -11
        call _GetStdHandle@4
        mov handle, eax
    
        push 0
        push offset written
        push 13
        push offset msg
        push handle
        call _WriteConsoleA@20
    
        push 0
        call _ExitProcess@4
        
        end

x86 Examples

More complicated: C code disassembled

Compile with "cl /Fahello.asm hello.c"

#include <stdio.h>

int add(int left, int right)
{
  return left + right;
}

int main(int argc, char* argv)
{
  int x = 1;
  int y = 2;
  int z = add(x, y);
}

x86 Examples

More complicated: C code disassembled

The generated assembler prelude

; Listing generated by Microsoft (R) Optimizing Compiler Version 19.22.27905.0 
    
	TITLE	D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c
	.686P
	.XMM
	include listing.inc
	.model	flat
    
INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES
    
PUBLIC	_add
PUBLIC	_main

x86 Examples

More complicated: C code disassembled

The generated assembler for add()

; Function compile flags: /Odtp
_TEXT	SEGMENT
_left$ = 8						; size = 4
_right$ = 12						; size = 4
_add	PROC
; File D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c
; Line 4
	push	ebp
	mov	ebp, esp
; Line 5
	mov	eax, DWORD PTR _left$[ebp]
	add	eax, DWORD PTR _right$[ebp]
; Line 6
	pop	ebp
	ret	0
_add	ENDP
_TEXT	ENDS

x86 Examples

More complicated: C code disassembled

The generated assembler for main() (part 1)

; Function compile flags: /Odtp
_TEXT	SEGMENT
_z$ = -12						; size = 4
_x$ = -8						; size = 4
_y$ = -4						; size = 4
_argc$ = 8						; size = 4
_argv$ = 12						; size = 4
_main	PROC
; File D:\Projects\Presentations.hg\Content\Assembler\Intel\x86\code\hello.c
; Line 9
	push	ebp
	mov	ebp, esp
	sub	esp, 12					; 0000000cH
; Line 10
	mov	DWORD PTR _x$[ebp], 1
; Line 11
	mov	DWORD PTR _y$[ebp], 2

x86 Examples

More complicated: C code disassembled

The generated assembler for main() (part 2)

; Line 12
	mov	eax, DWORD PTR _y$[ebp]
	push	eax
	mov	ecx, DWORD PTR _x$[ebp]
	push	ecx
	call	_add
	add	esp, 8
	mov	DWORD PTR _z$[ebp], eax
; Line 13
	xor	eax, eax
	mov	esp, ebp
	pop	ebp
	ret	0
_main	ENDP
_TEXT	ENDS

x86 Examples

More complicated: C code disassembled

Compile with "cl /Faconstructs.asm constructs.c"

int main(int argc, char* argv)
{
  int result = ifLoop(argc);
  forLoop();
}

x86 Examples

More complicated: C code disassembled

_DATA	SEGMENT
$SG7450	DB	'Hello, world, this is the %d loop', 0aH, 00H
_DATA	ENDS
; Function compile flags: /Odtp
_TEXT	SEGMENT
_result$ = -4 ; size = 4
_argc$ = 8	  ; size = 4
_argv$ = 12	  ; size = 4
_main	PROC  ; Line 24
	push	ebp
	mov	ebp, esp
	push	ecx
; Line 25
	mov	eax, DWORD PTR _argc$[ebp]
	push	eax
	call	_ifLoop
	add	esp, 4
	mov	DWORD PTR _result$[ebp], eax

x86 Examples

More complicated: C code disassembled

; Line 26
	call	_forLoop
; Line 27
	xor	eax, eax
	mov	esp, ebp
	pop	ebp
	ret	0
_main	ENDP

x86 Examples

More complicated: C code disassembled

int ifLoop(int arg)
{
  if (arg < 5)
  {
    return 12;
  }
  else
  {
    return 17;
  }
}

x86 Examples

More complicated: C code disassembled

_TEXT	SEGMENT
_arg$ = 8	; size = 4
_ifLoop	PROC
; Line 4
	push	ebp
	mov	ebp, esp
; Line 5
	cmp	DWORD PTR _arg$[ebp], 5
	jge	SHORT $LN2@ifLoop
; Line 7
	mov	eax, 12	; 0000000cH
	jmp	SHORT $LN1@ifLoop
; Line 8
	jmp	SHORT $LN1@ifLoop
$LN2@ifLoop:
; Line 11
	mov	eax, 17	; 00000011H
$LN1@ifLoop:
; Line 13
	pop	ebp
	ret	0
_ifLoop	ENDP
_TEXT	ENDS

x86 Examples

More complicated: C code disassembled

void forLoop()
{
  for (int i=0; i<10; i++)
  {
    printf("Hello, world, this is the %d loop\n", i);
  }
}

x86 Examples

More complicated: C code disassembled

_TEXT	SEGMENT
_i$1 = -4 ; size = 4
_forLoop PROC
; Line 16
	push	ebp
	mov	ebp, esp
	push	ecx

x86 Examples

More complicated: C code disassembled

; Line 17
	mov	DWORD PTR _i$1[ebp], 0
	jmp	SHORT $LN4@forLoop
$LN2@forLoop:
	mov	eax, DWORD PTR _i$1[ebp]
	add	eax, 1
	mov	DWORD PTR _i$1[ebp], eax
$LN4@forLoop:
	cmp	DWORD PTR _i$1[ebp], 10	; 0000000aH
	jge	SHORT $LN1@forLoop
; Line 19
	mov	ecx, DWORD PTR _i$1[ebp]
	push	ecx
	push	OFFSET $SG7450
	call	_printf
	add	esp, 8
; Line 20
	jmp	SHORT $LN2@forLoop
$LN1@forLoop:

x86 Examples

More complicated: C code disassembled

; Line 21
	mov	esp, ebp
	pop	ebp
	ret	0
_forLoop ENDP
_TEXT	ENDS

x64 Overview

Basic breakdown of x64 chips

Overview

Welcome to the 64-bit 80x86 line

Overview

General registers and their common use

Overview

Specific-purpose registers

Overview

Additional registers

Intel x64 Instructions

Expanding on the IA-32 instruction set

x64 Examples

Looking at x64 instructions in action

Summary

Wrapping up

Resources

Resources for assembly programming

Books

Books useful to have/read

Tools

Tools for assembly programming

Tools

Disassemblers

Credentials

Who is this guy?