Busy Developer's Guide to Building A Programming Language

ted.neward@newardassociates.com | Blog: http://blogs.newardassociates.com | Github: tedneward | LinkedIn: tedneward

Objectives

What are we looking to do here?

Building a language?

Why on earth would anyone bother?

Motivation

Why bother learning how compilers are built?

Problem

Consider a simple state machine

Problem

Consider the traditional "business rules"

Problem

Consider CS problems we haven't solved yet

Trends

Within the last few years, "Domain-Specific Languages" have become a new area of inquiry

Trends

Lots of tools are already "languages", "DSLs", or related to languages in some way

Language Design

Creating a custom programming language

Language Design

Creating a custom programming language/DSL

Lexing and Parsing

Lexing and Parsing

Intermediate Representation

Intermediate Representation

Analysis

Analysis

Execution

Execution

Code Generation

Code Generation

Lexing

From letters to words

Lexing

Lexing is a first-pass verification of input stream

Lexing

Lexing takes characters and produces "tokens"

Lexing

Lexing examples (Java)

Parsing

From words to sentences with meaning

Parsing

Parsing takes input format and transforms it into an intermediate representation

Parsing

Examples using English

Parsing

Much of the time, an input format is described by a formal grammar

Parsing

4-operation calculator grammar (EBNF)


input	::= ws expr ws;

expr	::= ws factor [{ws ('*'|'/') ws factor}];
factor	::= ws term [{ws ('+'|'-') ws term}];
term	::= '(' ws expr ws ')' | '-' ws expr | number;

number	::= {dgt} ['.' {dgt}] [('e'|'E') ['-'] {dgt}];
dgt	::= '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';
ws	::= [{' '|'\t'|'\n'|'\r'}];
  

Parsing

Sometimes the grammar is fed as input to a code-generator tool to generate the code for the lexer and parser

Parsing

Sometimes the parsing can be written as a library within a language, fed by Strings read in by the host language

Parsing

Sometimes we can duck the parser altogether and use another programming language to be the parsed input format for us

Parsing

Parser implementation choices

Intermediate Representation

An optimized form of the language input

Abstract Syntax Tree

Abstract Syntax Tree (AST)

Intermediate Representation

Intermediate Representation (IR) is an optimized format of the input

Analysis

Transformations on the IR

Analysis

Once the input form is in an IR, optimizations can be made to improve the efficiency of the code on a variety of different levels

Execution

Directly executing the intermediate rep

Execution

IR structures can be executed/interpreted by a host environment

Interpreter

Interpreter

Code Generation

To make it faster/standalone

Code generation

Produce code output for execution

Code generation

Code generation implementation tools

Example: Calc

A calculator 'language'

Example: Calc

Example: A calculator language (Calc)

Example: Calc

Example calculator script

2 + 2
7 - 7
5 + (2 - 3)
a = 4
b = 12
a * b

Example results

4
0
4
48

Example: Calc

Lexing

Example: Calc

Labeled Expressions grammar

MUL :   '*' ; // assigns token name to '*' used in grammar
DIV :   '/' ;
ADD :   '+' ;
SUB :   '-' ;
MOD :   '%' ;
ID  :   [a-zA-Z]+ ;      // match identifiers
INT :   [0-9]+ ;         // match integers
NEWLINE:'\r'? '\n' ;     // return newlines to parser (is end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace

Example: Calc

Parsing

Example: Calc

Labeled Expressions grammar

grammar LabeledExpr;
    
prog:   stat+ ;
    
stat:   expr NEWLINE                # printExpr
    |   ID '=' expr NEWLINE         # assign
    |   NEWLINE                     # blank
    ;
    
expr:   expr op=('*'|'/'|'%') expr  # MulDiv
    |   expr op=('+'|'-') expr      # AddSub
    |   INT                         # int
    |   ID                          # id
    |   '(' expr ')'                # parens
    ;

Example: Calc

AST

Example: Calc

Interpretation

Example: Calc

Driver

Example: Calc

Calc -- an interpreter

    public static void main(String[] args) throws Exception {
        System.out.println("Calc v1.0");
        System.out.println("Using ANTLR v" + org.antlr.v4.runtime.CharStreams.class.getPackage().getImplementationVersion());
    
        String filename = "../../test.expr";
        if (args.length > 0) {
            filename = args[0];
        }
        System.out.println("Parsing " + new java.io.File(filename).getAbsolutePath());
    
        CalcVisitor visitor = new CalcVisitor();
        LabeledExprLexer lexer = new LabeledExprLexer(CharStreams.fromFileName(filename));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        LabeledExprParser parser = new LabeledExprParser(tokens);
        ParseTree tree = parser.prog(); // parse; start at prog
        System.out.println(tree.toStringTree(parser)); // print tree as text
        visitor.visit(tree);
    }    

Example: Calc

CalcVisitor -- integer literal

	@Override public Integer visitInt(LabeledExprParser.IntContext ctx) {
        int value = Integer.valueOf(ctx.INT().getText());
        System.out.println("INT: " + value);
        return value;
    }

Example: Calc

CalcVisitor -- addSub nodes

	@Override public Integer visitAddSub(LabeledExprParser.AddSubContext ctx) { 
        int lhs = visit(ctx.expr(0));
        int rhs = visit(ctx.expr(1));
        if (ctx.op.getType() == LabeledExprParser.ADD) {
            System.out.println("ADD: " + lhs + " + " + rhs);
            return lhs + rhs;
        } else {
            System.out.println("SUB: " + lhs + " - " + rhs);
            return lhs - rhs;
        }
    }

Example: Calc

CalcVisitor -- mulDiv nodes

	@Override public Integer visitMulDiv(LabeledExprParser.MulDivContext ctx) { 
        int lhs = visit(ctx.expr(0));
        int rhs = visit(ctx.expr(1));
        if (ctx.op.getType() == LabeledExprParser.MUL) {
            System.out.println("MUL: " + lhs + " * " + rhs);
            return lhs * rhs;
        } else {
            System.out.println("DIV: " + lhs + " / " + rhs);
            return lhs / rhs;
        }
    }

Example: Calc

CalcVisitor -- parentheses (nested expressions)

    @Override public Integer visitParens(LabeledExprParser.ParensContext ctx) { 
        int value = visit(ctx.expr());
        System.out.println("PARENS: " + ctx.getText() + " = " + value);
        return value;
    }

Example: Calc

CalcVisitor -- assignment

    private HashMap<String, Integer> memory = new HashMap<>(); // memory for variables
    @Override public Integer visitAssign(LabeledExprParser.AssignContext ctx) { 
        String id = ctx.ID().getText();
        int value = visit(ctx.expr());
        memory.put(id, value);
        System.out.println("ASSIGN: " + id + " = " + value);
        return value;
    }

Example: Calc

CalcVisitor -- identifiers

	@Override public Integer visitId(LabeledExprParser.IdContext ctx) { 
        String id = ctx.ID().getText();
        Integer value = memory.get(id);
        System.out.println("ID: " + id + " = " + value);
        return value;
    }
    // {{## END identifier ##}
}
    
public class Calc {
    // {{## BEGIN main ##}}
    public static void main(String[] args) throws Exception {
        System.out.println("Calc v1.0");
        System.out.println("Using ANTLR v" + org.antlr.v4.runtime.CharStreams.class.getPackage().getImplementationVersion());
    
        String filename = "../../test.expr";
        if (args.length > 0) {
            filename = args[0];
        }
        System.out.println("Parsing " + new java.io.File(filename).getAbsolutePath());
    
        CalcVisitor visitor = new CalcVisitor();
        LabeledExprLexer lexer = new LabeledExprLexer(CharStreams.fromFileName(filename));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        LabeledExprParser parser = new LabeledExprParser(tokens);
        ParseTree tree = parser.prog(); // parse; start at prog
        System.out.println(tree.toStringTree(parser)); // print tree as text
        visitor.visit(tree);
    }    
    // {{## END main ##}}
}

Example: Calc

CalcVisitor -- printing expressions


Summary

Now you know

Resources

Where to go to get more

Resources

Books

Resources

Tools

Credentials

Who is this guy?