Skip to content

ovishkh/TinyCompiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

11 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

TinyCompiler - Educational Compiler Implementation

MIT License GitHub Node.js

๐Ÿ“š Project Overview

TinyCompiler is a fully functional educational compiler written in ~200 lines of JavaScript. It transforms LISP-like function call syntax into C-like function calls, demonstrating all major compiler phases in an easy-to-understand way.

โœจ Features

  • โœ… Complete compiler pipeline (Tokenizer โ†’ Parser โ†’ Transformer โ†’ Code Generator)
  • โœ… Educational & well-commented code
  • โœ… Web playground with live preview
  • โœ… REST API for compilation
  • โœ… Fully tested with comprehensive test suite
  • โœ… MIT Licensed

๐ŸŽฏ Project Goal

This project serves as a learning tool to understand how compilers work by breaking down complex compilation concepts into digestible, well-commented code.

๐Ÿ“ฆ Package Information


๐Ÿ”„ Compiler Architecture

The compiler follows the standard three-phase architecture:

Phase 1: Parsing

Converts raw input code into an Abstract Syntax Tree (AST)

  • Lexical Analysis (Tokenization): Breaks code into tokens
  • Syntactic Analysis: Builds AST from tokens

Phase 2: Transformation

Converts the source AST to a target AST using the visitor pattern

  • Traverses the AST depth-first
  • Applies transformations via visitor methods
  • Creates a new AST for target language

Phase 3: Code Generation

Converts the transformed AST back into code strings

  • Recursively prints each node type
  • Generates final output code

๐Ÿ“ Example Transformation

Input (LISP Syntax)

(add 2 (subtract 4 2))

Output (C Syntax)

add(2, subtract(4, 2));

Visual Flow

Input String
    โ†“
[Tokenizer] โ†’ Tokens Array
    โ†“
[Parser] โ†’ Abstract Syntax Tree (LISP AST)
    โ†“
[Traverser + Visitor] โ†’ Transformed AST (C-like AST)
    โ†“
[Code Generator] โ†’ Output String

๐Ÿ” Detailed Component Breakdown

1. TOKENIZER (tokenizer(input))

Purpose: Lexical analysis - breaks raw code into tokens

Input: String of code

"(add 2 (subtract 4 2))"

Output: Array of tokens

[
  { type: 'paren',  value: '('        },
  { type: 'name',   value: 'add'      },
  { type: 'number', value: '2'        },
  { type: 'paren',  value: '('        },
  { type: 'name',   value: 'subtract' },
  { type: 'number', value: '4'        },
  { type: 'number', value: '2'        },
  { type: 'paren',  value: ')'        },
  { type: 'paren',  value: ')'        }
]

Token Types Recognized:

  • paren: Parentheses ( and )
  • number: Numeric literals [0-9]+
  • string: String literals enclosed in ""
  • name: Function names and identifiers [a-zA-Z]+

Key Algorithm:

  • Uses cursor-based iteration (current pointer)
  • Handles multi-character tokens (numbers, names, strings)
  • Skips whitespace
  • Throws error on unrecognized characters

2. PARSER (parser(tokens))

Purpose: Syntactic analysis - builds AST from tokens

Input: Array of tokens

Output: Abstract Syntax Tree

{
  type: 'Program',
  body: [{
    type: 'CallExpression',
    name: 'add',
    params: [{
      type: 'NumberLiteral',
      value: '2'
    }, {
      type: 'CallExpression',
      name: 'subtract',
      params: [{
        type: 'NumberLiteral',
        value: '4'
      }, {
        type: 'NumberLiteral',
        value: '2'
      }]
    }]
  }]
}

Node Types Generated:

  • Program: Root node containing body array
  • CallExpression: Function invocation with name and params
  • NumberLiteral: Numeric values
  • StringLiteral: String values

Key Algorithm:

  • Recursive descent parser using walk() function
  • Maintains cursor (current) position
  • Handles nested expressions recursively
  • Program body supports multiple top-level expressions

3. TRAVERSER (traverser(ast, visitor))

Purpose: Depth-first tree traversal with visitor pattern

Key Features:

  • Visits every node in the AST
  • Supports both enter and exit callbacks
  • Maintains parent-child relationships
  • Enables non-destructive AST analysis

Traversal Order (Depth-First):

โ†’ Program (enter)
  โ†’ CallExpression (enter)
    โ†’ NumberLiteral (enter)
    โ† NumberLiteral (exit)
    โ†’ CallExpression (enter)
      โ†’ NumberLiteral (enter)
      โ† NumberLiteral (exit)
      โ†’ NumberLiteral (enter)
      โ† NumberLiteral (exit)
    โ† CallExpression (exit)
  โ† CallExpression (exit)
โ† Program (exit)

Visitor Pattern Structure:

{
  NodeType: {
    enter(node, parent) { /* ... */ },
    exit(node, parent) { /* ... */ }
  }
}

4. TRANSFORMER (transformer(ast))

Purpose: Converts LISP AST to C-like AST

Transformation Rules:

LISP AST โ†’ C-like AST
CallExpression with name โ†’ CallExpression with Identifier callee
Top-level CallExpression โ†’ Wrapped in ExpressionStatement
Direct params array โ†’ Arguments array
NumberLiteral โ†’ NumberLiteral (unchanged)
StringLiteral โ†’ StringLiteral (unchanged)

Input AST (LISP-like):

{
  type: 'CallExpression',
  name: 'add',
  params: [...]
}

Output AST (C-like):

{
  type: 'ExpressionStatement',
  expression: {
    type: 'CallExpression',
    callee: {
      type: 'Identifier',
      name: 'add'
    },
    arguments: [...]
  }
}

Key Features:

  • Uses _context hack for parent-to-child linking
  • Wraps top-level expressions in ExpressionStatement
  • Preserves nested structure
  • Non-destructive (creates new AST)

5. CODE GENERATOR (codeGenerator(node))

Purpose: Converts AST back into executable code strings

Output Generation:

Node Type Output Format
Program Maps and joins body with newlines
ExpressionStatement Expression + ;
CallExpression callee(arg1, arg2, ...)
Identifier Function name string
NumberLiteral Number value
StringLiteral "string value"

Example:

// Input node:
{
  type: 'CallExpression',
  callee: { type: 'Identifier', name: 'add' },
  arguments: [
    { type: 'NumberLiteral', value: '2' },
    { type: 'NumberLiteral', value: '3' }
  ]
}

// Output string:
// "add(2, 3)"

6. COMPILER (compiler(input))

Purpose: Main orchestration function linking all phases

Pipeline:

function compiler(input) {
  let tokens = tokenizer(input);        // Phase 1a: Lexical Analysis
  let ast    = parser(tokens);          // Phase 1b: Syntactic Analysis
  let newAst = transformer(ast);        // Phase 2: Transformation
  let output = codeGenerator(newAst);   // Phase 3: Code Generation
  return output;
}

Usage:

const result = compiler('(add 2 (subtract 4 2))');
console.log(result); // "add(2, subtract(4, 2));"

๐Ÿ“‹ Test Suite

File: test.js

Test Coverage:

  1. โœ… Tokenizer test - Verifies token generation
  2. โœ… Parser test - Verifies AST structure
  3. โœ… Transformer test - Verifies AST transformation
  4. โœ… Code Generator test - Verifies output generation
  5. โœ… Compiler integration test - Verifies end-to-end compilation

Running Tests:

node test.js

Expected Output:

All Passed!

๐Ÿ’ก Key Concepts Demonstrated

1. Tokenization

  • Character-by-character scanning
  • Multi-character token handling
  • Whitespace handling
  • Error detection

2. Recursive Descent Parsing

  • Building AST from tokens
  • Handling nested structures
  • Recursive function calls

3. Visitor Pattern

  • Tree traversal abstraction
  • Enter/exit callbacks
  • Separation of concerns

4. AST Transformation

  • Pattern matching on node types
  • Creating new AST structures
  • Context propagation

5. Code Generation

  • Recursive node processing
  • String concatenation
  • Output formatting

๐ŸŽ“ Learning Path

For beginners: Read through the code in this order:

  1. Tokenizer - Start here to understand lexical analysis
  2. Parser - Learn recursive descent parsing
  3. Traverser - Understand tree traversal patterns
  4. Transformer - See practical AST manipulation
  5. Code Generator - Learn output generation
  6. Compiler - See how it all ties together

Recommended approach:

  • Read the extensive inline comments in TinyCompiler.js
  • Trace through the example in test.js
  • Try modifying the input to see how it transforms
  • Add new token types or node types

๐Ÿ”ง Extensibility

The compiler can be extended to support:

  • More operators: Add to tokenizer (e.g., +, -, *, /)
  • More token types: Update tokenizer regex patterns
  • More node types: Add cases to parser, traverser, and generator
  • Different target languages: Modify transformer and generator
  • Type checking: Add validation in transformer
  • Optimization: Add optimization passes before code generation

๐Ÿ“š Real-World Applications

The concepts here are used in:

  • Language compilers (JavaScript, Python, Rust, etc.)
  • Build tools (Webpack, Babel, TypeScript)
  • Template engines (Handlebars, EJS)
  • Query languages (GraphQL, SQL parsers)
  • Domain-specific languages (Configuration files, markup)

๐Ÿ“„ License

MIT License

Copyright (c) 2025 Ovi Shekh

This project is licensed under the MIT License - see the LICENSE file for details.

You are free to use, modify, and distribute this project with or without attribution (though attribution is appreciated).


๐Ÿ™ Credits

Created by: Ovi Shekh (ovishekh.com)

Special Thanks:


๐Ÿš€ Quick Start

Prerequisites

  • Node.js v14 or higher
  • npm or yarn

Installation

# Clone the repository
git clone https://github.com/ovishkh/TinyCompiler.git
cd TinyCompiler

# Install dependencies
npm install

# Run tests
npm test
# or
make test

Basic Usage

Command Line

const { compiler } = require('./TinyCompiler');

// Simple example
const input = '(add 2 2)';
const output = compiler(input);
console.log(output); // "add(2, 2);"

// Nested example
const complex = '(add 2 (subtract 4 2))';
console.log(compiler(complex)); // "add(2, subtract(4, 2));"

Web Playground

Visit TinyCompiler.ovishekh.com for an interactive playground.

REST API

# Compile via HTTP API
curl "https://TinyCompiler.ovishekh.com/api/compiler?code=(add%201%202)"

Makefile Commands

make install    # Install dependencies
make test       # Run tests
make run        # Run with example
make dev        # Development mode
make clean      # Clean dependencies
make help       # Show all commands

Docker (Optional)

# Build Docker image
docker build -t tinycompiler .

# Run container
docker run -p 3000:3000 tinycompiler

๐ŸŒ Deployment

Vercel Deployment

The project is configured for easy deployment to Vercel:

# Install Vercel CLI
npm i -g vercel

# Deploy
vercel

# Deploy to production
vercel --prod

The vercel.json file is pre-configured with:

  • API routes at /api/compiler
  • Static file serving
  • CORS headers enabled

Environment Setup

The project includes:

  • vercel.json - Vercel configuration
  • api/compiler.js - Serverless API endpoint
  • index.html - Web playground UI

๐Ÿ”ง Development

Project Structure

TinyCompiler/
โ”œโ”€โ”€ TinyCompiler.js       # Main compiler (all phases)
โ”œโ”€โ”€ test.js               # Test suite
โ”œโ”€โ”€ index.html            # Web playground UI
โ”œโ”€โ”€ api/
โ”‚   โ””โ”€โ”€ compiler.js       # REST API endpoint
โ”œโ”€โ”€ vercel.json           # Vercel config
โ”œโ”€โ”€ Makefile              # Build automation
โ”œโ”€โ”€ package.json          # Dependencies
โ”œโ”€โ”€ LICENSE               # MIT License
โ”œโ”€โ”€ README.md             # This file
โ””โ”€โ”€ CONTRIBUTING.md       # Contribution guide

Running Tests

npm test          # Run all tests
make test         # Using Makefile

Code Style

  • 2-space indentation
  • Use 'use strict';
  • Comprehensive inline comments
  • Descriptive variable names

Contributing

See CONTRIBUTING.md for detailed contribution guidelines.



๐Ÿ› Common Issues and Solutions

Q: Why doesn't it support operators like +? A: This compiler demonstrates core concepts with simple syntax. Operators would require additional tokenizer rules.

Q: Can I use this for real compilation? A: This is an educational tool. Production compilers have significantly more complexity for optimization, error recovery, and language features.

Q: How do I add new features? A: Follow this pattern: Update tokenizer โ†’ Update parser โ†’ Update transformer โ†’ Update generator โ†’ Add tests.


๐Ÿ“– Further Reading

To deepen your compiler knowledge, explore:

  • Lexical Analysis: Finite automata, regular expressions
  • Parsing: Context-free grammars, parse trees
  • Semantic Analysis: Symbol tables, type checking
  • Optimization: Dead code elimination, constant folding
  • Code Generation: Register allocation, instruction selection

๐Ÿ“š Learning Resources

Official Links

Related Resources

Compiler Concepts

  • Lexical Analysis: Finite automata, regular expressions
  • Parsing: Context-free grammars, parse trees, recursive descent
  • Semantic Analysis: Symbol tables, type checking
  • Optimization: Dead code elimination, constant folding
  • Code Generation: Register allocation, instruction selection

๐Ÿง  The "Tiny" Internals (How it works)

This section contains the deep-dive explanation that was originally embedded in the source code.

1. The Tokenizer (Lexical Analysis)

We start by accepting an input string of code, and we're going to break it down into an array of tokens.

// (add 2 (subtract 4 2))
//    ^^^
//    Name token

We loop through the input string character by character:

  1. Parentheses: We check for ( and ).
  2. Whitespace: We check for spaces, but we don't store them as tokens. We just skip them.
  3. Numbers: We capture sequences of numbers [0-9] as a single token.
  4. Strings: We capture anything between " quotes.
  5. Names: We capture sequences of letters [a-z] as identifiers (like add or subtract).

2. The Parser (Syntactic Analysis)

Parsing converts the flat list of tokens into a tree structure (AST).

The Challenge: How do we handle nested calls like (add 2 (subtract 4 2))? The Solution: Recursion.

One main function walk() does the heavy lifting:

  • It looks at the current token.
  • If it's a number, it returns a NumberLiteral node.
  • If it's a string, it returns a StringLiteral node.
  • If it's a (, it knows it's a CallExpression.
    • It grabs the name (e.g., add).
    • It then calls walk() recursively to find the arguments.
    • This recursion continues until it hits a ), at which point it finishes the node and returns it.

3. The Transformer (AST Transformation)

This is where the magic happens. We want to turn LISP-like code into C-like code.

  • LISP: (add 2 2) -> Function name inside.
  • C: add(2, 2) -> Function name outside.

We use a Visitor Pattern to walk through the LISP AST. When we visit a CallExpression node, we build a new node for our C-AST:

  • We create a CallExpression with a callee property (the identifier add).
  • We put the parameters into an arguments array.
  • Context Hack: We pass a "context" array down the tree so that children nodes can push themselves into their parent's arguments list.

4. The Code Generator

The final step is effectively a big switch statement that prints strings.

  • Program: Iterate through each expression and add a newline.
  • ExpressionStatement: Print the expression and add a ;.
  • CallExpression: Print the callee, literal (, the arguments joined by comma, and ).
  • Identifier: Print the name.
  • NumberLiteral: Print the value.

๐Ÿ“„ License

MIT License

Copyright (c) 2025 Ovi Shekh

This project is licensed under the MIT License - see the LICENSE file for details.


Made with โค๏ธ by Ovi Shekh

About

TinyCompiler is a fully functional educational compiler written in ~200 lines of JavaScript

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published