TinyCompiler - Educational Compiler Implementation

📚 Project Overview

TinyCompiler is a fully functional educational compiler written in ~200 lines of JavaScript. It transforms LISP-like function call syntax into C-like function calls, demonstrating all major compiler phases in an easy-to-understand way.

✨ Features

✅ Complete compiler pipeline (Tokenizer → Parser → Transformer → Code Generator)
✅ Educational & well-commented code
✅ Web playground with live preview
✅ REST API for compilation
✅ Fully tested with comprehensive test suite
✅ MIT Licensed

🎯 Project Goal

This project serves as a learning tool to understand how compilers work by breaking down complex compilation concepts into digestible, well-commented code.

📦 Package Information

Name: TinyCompiler
Version: 1.0.0
Author: Ovi Shekh hi@ovishekh.com (ovishekh.com)
License: MIT
Repository: github.com/ovishkh/TinyCompiler
Live Demo: TinyCompiler.ovishekh.com
Main Entry Point: ./TinyCompiler.js

🔄 Compiler Architecture

The compiler follows the standard three-phase architecture:

Phase 1: Parsing

Converts raw input code into an Abstract Syntax Tree (AST)

Lexical Analysis (Tokenization): Breaks code into tokens
Syntactic Analysis: Builds AST from tokens

Phase 2: Transformation

Converts the source AST to a target AST using the visitor pattern

Traverses the AST depth-first
Applies transformations via visitor methods
Creates a new AST for target language

Phase 3: Code Generation

Converts the transformed AST back into code strings

Recursively prints each node type
Generates final output code

📝 Example Transformation

Input (LISP Syntax)

(add 2 (subtract 4 2))

Output (C Syntax)

add(2, subtract(4, 2));

Visual Flow

Input String
    ↓
[Tokenizer] → Tokens Array
    ↓
[Parser] → Abstract Syntax Tree (LISP AST)
    ↓
[Traverser + Visitor] → Transformed AST (C-like AST)
    ↓
[Code Generator] → Output String

🔍 Detailed Component Breakdown

1. TOKENIZER (`tokenizer(input)`)

Purpose: Lexical analysis - breaks raw code into tokens

Input: String of code

"(add 2 (subtract 4 2))"

Output: Array of tokens

[
  { type: 'paren',  value: '('        },
  { type: 'name',   value: 'add'      },
  { type: 'number', value: '2'        },
  { type: 'paren',  value: '('        },
  { type: 'name',   value: 'subtract' },
  { type: 'number', value: '4'        },
  { type: 'number', value: '2'        },
  { type: 'paren',  value: ')'        },
  { type: 'paren',  value: ')'        }
]

Token Types Recognized:

paren: Parentheses ( and )
number: Numeric literals [0-9]+
string: String literals enclosed in ""
name: Function names and identifiers [a-zA-Z]+

Key Algorithm:

Uses cursor-based iteration (current pointer)
Handles multi-character tokens (numbers, names, strings)
Skips whitespace
Throws error on unrecognized characters

2. PARSER (`parser(tokens)`)

Purpose: Syntactic analysis - builds AST from tokens

Input: Array of tokens

Output: Abstract Syntax Tree

{
  type: 'Program',
  body: [{
    type: 'CallExpression',
    name: 'add',
    params: [{
      type: 'NumberLiteral',
      value: '2'
    }, {
      type: 'CallExpression',
      name: 'subtract',
      params: [{
        type: 'NumberLiteral',
        value: '4'
      }, {
        type: 'NumberLiteral',
        value: '2'
      }]
    }]
  }]
}

Node Types Generated:

Program: Root node containing body array
CallExpression: Function invocation with name and params
NumberLiteral: Numeric values
StringLiteral: String values

Key Algorithm:

Recursive descent parser using walk() function
Maintains cursor (current) position
Handles nested expressions recursively
Program body supports multiple top-level expressions

3. TRAVERSER (`traverser(ast, visitor)`)

Purpose: Depth-first tree traversal with visitor pattern

Key Features:

Visits every node in the AST
Supports both enter and exit callbacks
Maintains parent-child relationships
Enables non-destructive AST analysis

Traversal Order (Depth-First):

→ Program (enter)
  → CallExpression (enter)
    → NumberLiteral (enter)
    ← NumberLiteral (exit)
    → CallExpression (enter)
      → NumberLiteral (enter)
      ← NumberLiteral (exit)
      → NumberLiteral (enter)
      ← NumberLiteral (exit)
    ← CallExpression (exit)
  ← CallExpression (exit)
← Program (exit)

Visitor Pattern Structure:

{
  NodeType: {
    enter(node, parent) { /* ... */ },
    exit(node, parent) { /* ... */ }
  }
}

4. TRANSFORMER (`transformer(ast)`)

Purpose: Converts LISP AST to C-like AST

Transformation Rules:

LISP AST	→	C-like AST
`CallExpression` with name	→	`CallExpression` with `Identifier` callee
Top-level `CallExpression`	→	Wrapped in `ExpressionStatement`
Direct params array	→	Arguments array
`NumberLiteral`	→	`NumberLiteral` (unchanged)
`StringLiteral`	→	`StringLiteral` (unchanged)

Input AST (LISP-like):

{
  type: 'CallExpression',
  name: 'add',
  params: [...]
}

Output AST (C-like):

{
  type: 'ExpressionStatement',
  expression: {
    type: 'CallExpression',
    callee: {
      type: 'Identifier',
      name: 'add'
    },
    arguments: [...]
  }
}

Key Features:

Uses _context hack for parent-to-child linking
Wraps top-level expressions in ExpressionStatement
Preserves nested structure
Non-destructive (creates new AST)

5. CODE GENERATOR (`codeGenerator(node)`)

Purpose: Converts AST back into executable code strings

Output Generation:

Node Type	Output Format
`Program`	Maps and joins body with newlines
`ExpressionStatement`	Expression + `;`
`CallExpression`	`callee(arg1, arg2, ...)`
`Identifier`	Function name string
`NumberLiteral`	Number value
`StringLiteral`	`"string value"`

Example:

// Input node:
{
  type: 'CallExpression',
  callee: { type: 'Identifier', name: 'add' },
  arguments: [
    { type: 'NumberLiteral', value: '2' },
    { type: 'NumberLiteral', value: '3' }
  ]
}

// Output string:
// "add(2, 3)"

6. COMPILER (`compiler(input)`)

Purpose: Main orchestration function linking all phases

Pipeline:

function compiler(input) {
  let tokens = tokenizer(input);        // Phase 1a: Lexical Analysis
  let ast    = parser(tokens);          // Phase 1b: Syntactic Analysis
  let newAst = transformer(ast);        // Phase 2: Transformation
  let output = codeGenerator(newAst);   // Phase 3: Code Generation
  return output;
}

Usage:

const result = compiler('(add 2 (subtract 4 2))');
console.log(result); // "add(2, subtract(4, 2));"

📋 Test Suite

File: test.js

Test Coverage:

✅ Tokenizer test - Verifies token generation
✅ Parser test - Verifies AST structure
✅ Transformer test - Verifies AST transformation
✅ Code Generator test - Verifies output generation
✅ Compiler integration test - Verifies end-to-end compilation

Running Tests:

node test.js

Expected Output:

All Passed!

💡 Key Concepts Demonstrated

1. Tokenization

Character-by-character scanning
Multi-character token handling
Whitespace handling
Error detection

2. Recursive Descent Parsing

Building AST from tokens
Handling nested structures
Recursive function calls

3. Visitor Pattern

Tree traversal abstraction
Enter/exit callbacks
Separation of concerns

4. AST Transformation

Pattern matching on node types
Creating new AST structures
Context propagation

5. Code Generation

Recursive node processing
String concatenation
Output formatting

🎓 Learning Path

For beginners: Read through the code in this order:

Tokenizer - Start here to understand lexical analysis
Parser - Learn recursive descent parsing
Traverser - Understand tree traversal patterns
Transformer - See practical AST manipulation
Code Generator - Learn output generation
Compiler - See how it all ties together

Recommended approach:

Read the extensive inline comments in TinyCompiler.js
Trace through the example in test.js
Try modifying the input to see how it transforms
Add new token types or node types

🔧 Extensibility

The compiler can be extended to support:

More operators: Add to tokenizer (e.g., +, -, *, /)
More token types: Update tokenizer regex patterns
More node types: Add cases to parser, traverser, and generator
Different target languages: Modify transformer and generator
Type checking: Add validation in transformer
Optimization: Add optimization passes before code generation

📚 Real-World Applications

The concepts here are used in:

Language compilers (JavaScript, Python, Rust, etc.)
Build tools (Webpack, Babel, TypeScript)
Template engines (Handlebars, EJS)
Query languages (GraphQL, SQL parsers)
Domain-specific languages (Configuration files, markup)

📄 License

MIT License

This project is licensed under the MIT License - see the LICENSE file for details.

You are free to use, modify, and distribute this project with or without attribution (though attribution is appreciated).

🙏 Credits

Created by: Ovi Shekh (ovishekh.com)

Special Thanks:

Md. Rashedul Alam - For guidance and support
James Kyle - For compiler design inspiration
Gates Smasher - For valuable contributions
Corrado Böhm - For theoretical foundations

🚀 Quick Start

Prerequisites

Node.js v14 or higher
npm or yarn

Installation

# Clone the repository
git clone https://github.com/ovishkh/TinyCompiler.git
cd TinyCompiler

# Install dependencies
npm install

# Run tests
npm test
# or
make test

Basic Usage

Command Line

const { compiler } = require('./TinyCompiler');

// Simple example
const input = '(add 2 2)';
const output = compiler(input);
console.log(output); // "add(2, 2);"

// Nested example
const complex = '(add 2 (subtract 4 2))';
console.log(compiler(complex)); // "add(2, subtract(4, 2));"

Web Playground

Visit TinyCompiler.ovishekh.com for an interactive playground.

REST API

# Compile via HTTP API
curl "https://TinyCompiler.ovishekh.com/api/compiler?code=(add%201%202)"

Makefile Commands

make install    # Install dependencies
make test       # Run tests
make run        # Run with example
make dev        # Development mode
make clean      # Clean dependencies
make help       # Show all commands

Docker (Optional)

# Build Docker image
docker build -t tinycompiler .

# Run container
docker run -p 3000:3000 tinycompiler

🌐 Deployment

Vercel Deployment

The project is configured for easy deployment to Vercel:

# Install Vercel CLI
npm i -g vercel

# Deploy
vercel

# Deploy to production
vercel --prod

The vercel.json file is pre-configured with:

API routes at /api/compiler
Static file serving
CORS headers enabled

Environment Setup

The project includes:

vercel.json - Vercel configuration
api/compiler.js - Serverless API endpoint
index.html - Web playground UI

🔧 Development

Project Structure

TinyCompiler/
├── TinyCompiler.js       # Main compiler (all phases)
├── test.js               # Test suite
├── index.html            # Web playground UI
├── api/
│   └── compiler.js       # REST API endpoint
├── vercel.json           # Vercel config
├── Makefile              # Build automation
├── package.json          # Dependencies
├── LICENSE               # MIT License
├── README.md             # This file
└── CONTRIBUTING.md       # Contribution guide

Running Tests

npm test          # Run all tests
make test         # Using Makefile

Code Style

2-space indentation
Use 'use strict';
Comprehensive inline comments
Descriptive variable names

Contributing

See CONTRIBUTING.md for detailed contribution guidelines.

🐛 Common Issues and Solutions

Q: Why doesn't it support operators like +? A: This compiler demonstrates core concepts with simple syntax. Operators would require additional tokenizer rules.

Q: Can I use this for real compilation? A: This is an educational tool. Production compilers have significantly more complexity for optimization, error recovery, and language features.

Q: How do I add new features? A: Follow this pattern: Update tokenizer → Update parser → Update transformer → Update generator → Add tests.

📖 Further Reading

To deepen your compiler knowledge, explore:

Lexical Analysis: Finite automata, regular expressions
Parsing: Context-free grammars, parse trees
Semantic Analysis: Symbol tables, type checking
Optimization: Dead code elimination, constant folding
Code Generation: Register allocation, instruction selection

📚 Learning Resources

Official Links

🌐 Website: ovishekh.com
💻 GitHub: github.com/ovishkh/TinyCompiler
🎮 Live Demo: TinyCompiler.ovishekh.com
📖 API Docs: GitHub Wiki

Related Resources

Crafting Interpreters - Interactive guide
Engineering a Compiler - Advanced reading

Compiler Concepts

Lexical Analysis: Finite automata, regular expressions
Parsing: Context-free grammars, parse trees, recursive descent
Semantic Analysis: Symbol tables, type checking
Optimization: Dead code elimination, constant folding
Code Generation: Register allocation, instruction selection

🧠 The "Tiny" Internals (How it works)

This section contains the deep-dive explanation that was originally embedded in the source code.

1. The Tokenizer (Lexical Analysis)

We start by accepting an input string of code, and we're going to break it down into an array of tokens.

// (add 2 (subtract 4 2))
//    ^^^
//    Name token

We loop through the input string character by character:

Parentheses: We check for ( and ).
Whitespace: We check for spaces, but we don't store them as tokens. We just skip them.
Numbers: We capture sequences of numbers [0-9] as a single token.
Strings: We capture anything between " quotes.
Names: We capture sequences of letters [a-z] as identifiers (like add or subtract).

2. The Parser (Syntactic Analysis)

Parsing converts the flat list of tokens into a tree structure (AST).

The Challenge: How do we handle nested calls like (add 2 (subtract 4 2))? The Solution: Recursion.

One main function walk() does the heavy lifting:

It looks at the current token.
If it's a number, it returns a NumberLiteral node.
If it's a string, it returns a StringLiteral node.
If it's a (, it knows it's a CallExpression.
- It grabs the name (e.g., add).
- It then calls walk() recursively to find the arguments.
- This recursion continues until it hits a ), at which point it finishes the node and returns it.

3. The Transformer (AST Transformation)

This is where the magic happens. We want to turn LISP-like code into C-like code.

LISP: (add 2 2) -> Function name inside.
C: add(2, 2) -> Function name outside.

We use a Visitor Pattern to walk through the LISP AST. When we visit a CallExpression node, we build a new node for our C-AST:

We create a CallExpression with a callee property (the identifier add).
We put the parameters into an arguments array.
Context Hack: We pass a "context" array down the tree so that children nodes can push themselves into their parent's arguments list.

4. The Code Generator

The final step is effectively a big switch statement that prints strings.

Program: Iterate through each expression and add a newline.
ExpressionStatement: Print the expression and add a ;.
CallExpression: Print the callee, literal (, the arguments joined by comma, and ).
Identifier: Print the name.
NumberLiteral: Print the value.

📄 License

MIT License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ by Ovi Shekh

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
api		api
COMPILER_CONCEPTS.md		COMPILER_CONCEPTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TinyCompiler.js		TinyCompiler.js
index.html		index.html
package.json		package.json
test.js		test.js
test_suite.js		test_suite.js
vercel.json		vercel.json

License

ovishkh/TinyCompiler

Folders and files

Latest commit

History

Repository files navigation

TinyCompiler - Educational Compiler Implementation

📚 Project Overview

✨ Features

🎯 Project Goal

📦 Package Information

🔄 Compiler Architecture

Phase 1: Parsing

Phase 2: Transformation

Phase 3: Code Generation

📝 Example Transformation

Input (LISP Syntax)

Output (C Syntax)

Visual Flow

🔍 Detailed Component Breakdown

1. TOKENIZER (tokenizer(input))

2. PARSER (parser(tokens))

3. TRAVERSER (traverser(ast, visitor))

4. TRANSFORMER (transformer(ast))

5. CODE GENERATOR (codeGenerator(node))

6. COMPILER (compiler(input))

📋 Test Suite

💡 Key Concepts Demonstrated

1. Tokenization

2. Recursive Descent Parsing

3. Visitor Pattern

4. AST Transformation

5. Code Generation

🎓 Learning Path

🔧 Extensibility

📚 Real-World Applications

📄 License

🙏 Credits

🚀 Quick Start

Prerequisites

Installation

Basic Usage

Command Line

Web Playground

REST API

Makefile Commands

Docker (Optional)

🌐 Deployment

Vercel Deployment

Environment Setup

🔧 Development

Project Structure

Running Tests

Code Style

Contributing

🐛 Common Issues and Solutions

📖 Further Reading

📚 Learning Resources

Official Links

Related Resources

Compiler Concepts

🧠 The "Tiny" Internals (How it works)

1. The Tokenizer (Lexical Analysis)

2. The Parser (Syntactic Analysis)

3. The Transformer (AST Transformation)

4. The Code Generator

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. TOKENIZER (`tokenizer(input)`)

2. PARSER (`parser(tokens)`)

3. TRAVERSER (`traverser(ast, visitor)`)

4. TRANSFORMER (`transformer(ast)`)

5. CODE GENERATOR (`codeGenerator(node)`)

6. COMPILER (`compiler(input)`)

Packages