TinyCompiler is a fully functional educational compiler written in ~200 lines of JavaScript. It transforms LISP-like function call syntax into C-like function calls, demonstrating all major compiler phases in an easy-to-understand way.
- โ Complete compiler pipeline (Tokenizer โ Parser โ Transformer โ Code Generator)
- โ Educational & well-commented code
- โ Web playground with live preview
- โ REST API for compilation
- โ Fully tested with comprehensive test suite
- โ MIT Licensed
This project serves as a learning tool to understand how compilers work by breaking down complex compilation concepts into digestible, well-commented code.
- Name: TinyCompiler
- Version: 1.0.0
- Author: Ovi Shekh hi@ovishekh.com (ovishekh.com)
- License: MIT
- Repository: github.com/ovishkh/TinyCompiler
- Live Demo: TinyCompiler.ovishekh.com
- Main Entry Point:
./TinyCompiler.js
The compiler follows the standard three-phase architecture:
Converts raw input code into an Abstract Syntax Tree (AST)
- Lexical Analysis (Tokenization): Breaks code into tokens
- Syntactic Analysis: Builds AST from tokens
Converts the source AST to a target AST using the visitor pattern
- Traverses the AST depth-first
- Applies transformations via visitor methods
- Creates a new AST for target language
Converts the transformed AST back into code strings
- Recursively prints each node type
- Generates final output code
(add 2 (subtract 4 2))add(2, subtract(4, 2));Input String
โ
[Tokenizer] โ Tokens Array
โ
[Parser] โ Abstract Syntax Tree (LISP AST)
โ
[Traverser + Visitor] โ Transformed AST (C-like AST)
โ
[Code Generator] โ Output String
Purpose: Lexical analysis - breaks raw code into tokens
Input: String of code
"(add 2 (subtract 4 2))"Output: Array of tokens
[
{ type: 'paren', value: '(' },
{ type: 'name', value: 'add' },
{ type: 'number', value: '2' },
{ type: 'paren', value: '(' },
{ type: 'name', value: 'subtract' },
{ type: 'number', value: '4' },
{ type: 'number', value: '2' },
{ type: 'paren', value: ')' },
{ type: 'paren', value: ')' }
]Token Types Recognized:
paren: Parentheses(and)number: Numeric literals[0-9]+string: String literals enclosed in""name: Function names and identifiers[a-zA-Z]+
Key Algorithm:
- Uses cursor-based iteration (
currentpointer) - Handles multi-character tokens (numbers, names, strings)
- Skips whitespace
- Throws error on unrecognized characters
Purpose: Syntactic analysis - builds AST from tokens
Input: Array of tokens
Output: Abstract Syntax Tree
{
type: 'Program',
body: [{
type: 'CallExpression',
name: 'add',
params: [{
type: 'NumberLiteral',
value: '2'
}, {
type: 'CallExpression',
name: 'subtract',
params: [{
type: 'NumberLiteral',
value: '4'
}, {
type: 'NumberLiteral',
value: '2'
}]
}]
}]
}Node Types Generated:
Program: Root node containing body arrayCallExpression: Function invocation with name and paramsNumberLiteral: Numeric valuesStringLiteral: String values
Key Algorithm:
- Recursive descent parser using
walk()function - Maintains cursor (
current) position - Handles nested expressions recursively
- Program body supports multiple top-level expressions
Purpose: Depth-first tree traversal with visitor pattern
Key Features:
- Visits every node in the AST
- Supports both
enterandexitcallbacks - Maintains parent-child relationships
- Enables non-destructive AST analysis
Traversal Order (Depth-First):
โ Program (enter)
โ CallExpression (enter)
โ NumberLiteral (enter)
โ NumberLiteral (exit)
โ CallExpression (enter)
โ NumberLiteral (enter)
โ NumberLiteral (exit)
โ NumberLiteral (enter)
โ NumberLiteral (exit)
โ CallExpression (exit)
โ CallExpression (exit)
โ Program (exit)
Visitor Pattern Structure:
{
NodeType: {
enter(node, parent) { /* ... */ },
exit(node, parent) { /* ... */ }
}
}Purpose: Converts LISP AST to C-like AST
Transformation Rules:
| LISP AST | โ | C-like AST |
|---|---|---|
CallExpression with name |
โ | CallExpression with Identifier callee |
Top-level CallExpression |
โ | Wrapped in ExpressionStatement |
| Direct params array | โ | Arguments array |
NumberLiteral |
โ | NumberLiteral (unchanged) |
StringLiteral |
โ | StringLiteral (unchanged) |
Input AST (LISP-like):
{
type: 'CallExpression',
name: 'add',
params: [...]
}Output AST (C-like):
{
type: 'ExpressionStatement',
expression: {
type: 'CallExpression',
callee: {
type: 'Identifier',
name: 'add'
},
arguments: [...]
}
}Key Features:
- Uses
_contexthack for parent-to-child linking - Wraps top-level expressions in
ExpressionStatement - Preserves nested structure
- Non-destructive (creates new AST)
Purpose: Converts AST back into executable code strings
Output Generation:
| Node Type | Output Format |
|---|---|
Program |
Maps and joins body with newlines |
ExpressionStatement |
Expression + ; |
CallExpression |
callee(arg1, arg2, ...) |
Identifier |
Function name string |
NumberLiteral |
Number value |
StringLiteral |
"string value" |
Example:
// Input node:
{
type: 'CallExpression',
callee: { type: 'Identifier', name: 'add' },
arguments: [
{ type: 'NumberLiteral', value: '2' },
{ type: 'NumberLiteral', value: '3' }
]
}
// Output string:
// "add(2, 3)"Purpose: Main orchestration function linking all phases
Pipeline:
function compiler(input) {
let tokens = tokenizer(input); // Phase 1a: Lexical Analysis
let ast = parser(tokens); // Phase 1b: Syntactic Analysis
let newAst = transformer(ast); // Phase 2: Transformation
let output = codeGenerator(newAst); // Phase 3: Code Generation
return output;
}Usage:
const result = compiler('(add 2 (subtract 4 2))');
console.log(result); // "add(2, subtract(4, 2));"File: test.js
Test Coverage:
- โ Tokenizer test - Verifies token generation
- โ Parser test - Verifies AST structure
- โ Transformer test - Verifies AST transformation
- โ Code Generator test - Verifies output generation
- โ Compiler integration test - Verifies end-to-end compilation
Running Tests:
node test.jsExpected Output:
All Passed!
- Character-by-character scanning
- Multi-character token handling
- Whitespace handling
- Error detection
- Building AST from tokens
- Handling nested structures
- Recursive function calls
- Tree traversal abstraction
- Enter/exit callbacks
- Separation of concerns
- Pattern matching on node types
- Creating new AST structures
- Context propagation
- Recursive node processing
- String concatenation
- Output formatting
For beginners: Read through the code in this order:
- Tokenizer - Start here to understand lexical analysis
- Parser - Learn recursive descent parsing
- Traverser - Understand tree traversal patterns
- Transformer - See practical AST manipulation
- Code Generator - Learn output generation
- Compiler - See how it all ties together
Recommended approach:
- Read the extensive inline comments in
TinyCompiler.js - Trace through the example in
test.js - Try modifying the input to see how it transforms
- Add new token types or node types
The compiler can be extended to support:
- More operators: Add to tokenizer (e.g.,
+,-,*,/) - More token types: Update tokenizer regex patterns
- More node types: Add cases to parser, traverser, and generator
- Different target languages: Modify transformer and generator
- Type checking: Add validation in transformer
- Optimization: Add optimization passes before code generation
The concepts here are used in:
- Language compilers (JavaScript, Python, Rust, etc.)
- Build tools (Webpack, Babel, TypeScript)
- Template engines (Handlebars, EJS)
- Query languages (GraphQL, SQL parsers)
- Domain-specific languages (Configuration files, markup)
MIT License
Copyright (c) 2025 Ovi Shekh
This project is licensed under the MIT License - see the LICENSE file for details.
You are free to use, modify, and distribute this project with or without attribution (though attribution is appreciated).
Created by: Ovi Shekh (ovishekh.com)
Special Thanks:
- Md. Rashedul Alam - For guidance and support
- James Kyle - For compiler design inspiration
- Gates Smasher - For valuable contributions
- Corrado Bรถhm - For theoretical foundations
- Node.js v14 or higher
- npm or yarn
# Clone the repository
git clone https://github.com/ovishkh/TinyCompiler.git
cd TinyCompiler
# Install dependencies
npm install
# Run tests
npm test
# or
make testconst { compiler } = require('./TinyCompiler');
// Simple example
const input = '(add 2 2)';
const output = compiler(input);
console.log(output); // "add(2, 2);"
// Nested example
const complex = '(add 2 (subtract 4 2))';
console.log(compiler(complex)); // "add(2, subtract(4, 2));"Visit TinyCompiler.ovishekh.com for an interactive playground.
# Compile via HTTP API
curl "https://TinyCompiler.ovishekh.com/api/compiler?code=(add%201%202)"make install # Install dependencies
make test # Run tests
make run # Run with example
make dev # Development mode
make clean # Clean dependencies
make help # Show all commands# Build Docker image
docker build -t tinycompiler .
# Run container
docker run -p 3000:3000 tinycompilerThe project is configured for easy deployment to Vercel:
# Install Vercel CLI
npm i -g vercel
# Deploy
vercel
# Deploy to production
vercel --prodThe vercel.json file is pre-configured with:
- API routes at
/api/compiler - Static file serving
- CORS headers enabled
The project includes:
vercel.json- Vercel configurationapi/compiler.js- Serverless API endpointindex.html- Web playground UI
TinyCompiler/
โโโ TinyCompiler.js # Main compiler (all phases)
โโโ test.js # Test suite
โโโ index.html # Web playground UI
โโโ api/
โ โโโ compiler.js # REST API endpoint
โโโ vercel.json # Vercel config
โโโ Makefile # Build automation
โโโ package.json # Dependencies
โโโ LICENSE # MIT License
โโโ README.md # This file
โโโ CONTRIBUTING.md # Contribution guide
npm test # Run all tests
make test # Using Makefile- 2-space indentation
- Use
'use strict'; - Comprehensive inline comments
- Descriptive variable names
See CONTRIBUTING.md for detailed contribution guidelines.
Q: Why doesn't it support operators like +?
A: This compiler demonstrates core concepts with simple syntax. Operators would require additional tokenizer rules.
Q: Can I use this for real compilation? A: This is an educational tool. Production compilers have significantly more complexity for optimization, error recovery, and language features.
Q: How do I add new features? A: Follow this pattern: Update tokenizer โ Update parser โ Update transformer โ Update generator โ Add tests.
To deepen your compiler knowledge, explore:
- Lexical Analysis: Finite automata, regular expressions
- Parsing: Context-free grammars, parse trees
- Semantic Analysis: Symbol tables, type checking
- Optimization: Dead code elimination, constant folding
- Code Generation: Register allocation, instruction selection
- ๐ Website: ovishekh.com
- ๐ป GitHub: github.com/ovishkh/TinyCompiler
- ๐ฎ Live Demo: TinyCompiler.ovishekh.com
- ๐ API Docs: GitHub Wiki
- Crafting Interpreters - Interactive guide
- Engineering a Compiler - Advanced reading
- Lexical Analysis: Finite automata, regular expressions
- Parsing: Context-free grammars, parse trees, recursive descent
- Semantic Analysis: Symbol tables, type checking
- Optimization: Dead code elimination, constant folding
- Code Generation: Register allocation, instruction selection
This section contains the deep-dive explanation that was originally embedded in the source code.
We start by accepting an input string of code, and we're going to break it down into an array of tokens.
// (add 2 (subtract 4 2))
// ^^^
// Name tokenWe loop through the input string character by character:
- Parentheses: We check for
(and). - Whitespace: We check for spaces, but we don't store them as tokens. We just skip them.
- Numbers: We capture sequences of numbers
[0-9]as a single token. - Strings: We capture anything between
"quotes. - Names: We capture sequences of letters
[a-z]as identifiers (likeaddorsubtract).
Parsing converts the flat list of tokens into a tree structure (AST).
The Challenge: How do we handle nested calls like (add 2 (subtract 4 2))?
The Solution: Recursion.
One main function walk() does the heavy lifting:
- It looks at the current token.
- If it's a
number, it returns aNumberLiteralnode. - If it's a
string, it returns aStringLiteralnode. - If it's a
(, it knows it's aCallExpression.- It grabs the name (e.g.,
add). - It then calls
walk()recursively to find the arguments. - This recursion continues until it hits a
), at which point it finishes the node and returns it.
- It grabs the name (e.g.,
This is where the magic happens. We want to turn LISP-like code into C-like code.
- LISP:
(add 2 2)-> Function name inside. - C:
add(2, 2)-> Function name outside.
We use a Visitor Pattern to walk through the LISP AST. When we visit a CallExpression node, we build a new node for our C-AST:
- We create a
CallExpressionwith acalleeproperty (the identifieradd). - We put the parameters into an
argumentsarray. - Context Hack: We pass a "context" array down the tree so that children nodes can push themselves into their parent's
argumentslist.
The final step is effectively a big switch statement that prints strings.
Program: Iterate through each expression and add a newline.ExpressionStatement: Print the expression and add a;.CallExpression: Print thecallee, literal(, theargumentsjoined by comma, and).Identifier: Print the name.NumberLiteral: Print the value.
MIT License
Copyright (c) 2025 Ovi Shekh
This project is licensed under the MIT License - see the LICENSE file for details.
Made with โค๏ธ by Ovi Shekh