Busy Developer's Guide to Building A Programming Language

Objectives

What are we looking to do here?

Building a language?

Why on earth would anyone bother?

Motivation

Why bother learning how compilers are built?

Problem

Consider a simple state machine

Problem

Consider the traditional "business rules"

Problem

Consider CS problems we haven't solved yet

Trends

Within the last few years, "Domain-Specific Languages" have become a new area of inquiry

Trends

Lots of tools are already "languages", "DSLs", or related to languages in some way

Language Design

Creating a custom programming language

Language Design

Creating a custom programming language/DSL

Lexing and Parsing

Lexing and Parsing

Intermediate Representation

Intermediate Representation

Analysis

Analysis

Execution

Execution

Code Generation

Code Generation

Lexing

From letters to words

Lexing

Lexing is a first-pass verification of input stream

Lexing

Lexing takes characters and produces "tokens"

Lexing

Lexing examples (Java)

Parsing

From words to sentences with meaning

Parsing

Parsing takes input format and transforms it into an intermediate representation

Parsing

Examples using English

Parsing

Much of the time, an input format is described by a formal grammar

Parsing

4-operation calculator grammar (EBNF)


input	::= ws expr ws;

expr	::= ws factor [{ws ('*'|'/') ws factor}];
factor	::= ws term [{ws ('+'|'-') ws term}];
term	::= '(' ws expr ws ')' | '-' ws expr | number;

number	::= {dgt} ['.' {dgt}] [('e'|'E') ['-'] {dgt}];
dgt	::= '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';
ws	::= [{' '|'\t'|'\n'|'\r'}];
  

Parsing

Sometimes the grammar is fed as input to a code-generator tool to generate the code for the lexer and parser

Parsing

Sometimes the parsing can be written as a library within a language, fed by Strings read in by the host language

Parsing

Sometimes we can duck the parser altogether and use another programming language to be the parsed input format for us

Parsing

Parser implementation choices

Intermediate Representation

An optimized form of the language input

Abstract Syntax Tree

Abstract Syntax Tree (AST)

Intermediate Representation

Intermediate Representation (IR) is an optimized format of the input

Analysis

Transformations on the IR

Analysis

Once the input form is in an IR, optimizations can be made to improve the efficiency of the code on a variety of different levels

Execution

Directly executing the intermediate rep

Execution

IR structures can be executed/interpreted by a host environment

Interpreter

Interpreter

Code Generation

To make it faster/standalone

Code generation

Produce code output for execution

Code generation

Code generation implementation tools

Example: Calc

A calculator 'language'

Example: Calc

Example: A calculator language (Calc)

Example: Calc

Example calculator script

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/test.expr NOT FOUND>>

Example results

193
17
9

Example: Calc

Lexing

Example: Calc

Labeled Expressions grammar

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/antlr/LabeledExpr.g4 NOT FOUND>>

Example: Calc

Parsing

Example: Calc

Labeled Expressions grammar

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/antlr/LabeledExpr.g4 NOT FOUND>>

Example: Calc

AST

Example: Calc

Interpretation

Example: Calc

EvalVisitor -- an interpreter

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/java/EvalVisitor.java NOT FOUND>>

Example: Calc

EvalVisitor -- assignment

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/java/EvalVisitor.java NOT FOUND>>

Example: Calc

EvalVisitor -- printing expressions

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/java/EvalVisitor.java NOT FOUND>>

Example: Calc

EvalVisitor -- constants and identifiers

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/java/EvalVisitor.java NOT FOUND>>

Example: Calc

EvalVisitor -- mulDiv and addSub nodes

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/java/EvalVisitor.java NOT FOUND>>

Example: Calc

EvalVisitor -- parentheses (nested expressions)

<</home/runner/work/Slides/Slides/Content/LanguageDesign/Calc/code/src/main/java/EvalVisitor.java NOT FOUND>>

Example: Calc

Lastly, a driver to control the whole pipeline

Summary

Resources

Books

Resources

Tools

Credentials

Who is this guy?