Skip to content

This GitHub repository contains Python implementations of the top 25 machine learning algorithms from scratch using only the Python programming language and the NumPy library. The goal of this repository is to help beginners and aspiring data scientists understand the inner workings of these algorithms by providing transparent and readable code.

License

Notifications You must be signed in to change notification settings

inboxpraveen/ML-Algorithms-from-scratch

Repository files navigation

Machine Learning Algorithms from Scratch

Python NumPy License: MIT

Learn machine learning by understanding the math and code behind the algorithms!

๐Ÿ“š About This Repository

This repository contains clear, educational implementations of essential machine learning algorithms built from scratch using only Python and NumPy. Each algorithm includes comprehensive documentation, mathematical explanations, and practical examples.

Perfect for:

  • ๐ŸŽ“ Students learning machine learning fundamentals
  • ๐Ÿ‘จโ€๐Ÿ’ป Developers wanting to understand algorithms deeply
  • ๐Ÿ“Š Data scientists preparing for technical interviews
  • ๐Ÿ”ฌ Anyone curious about how ML algorithms actually work

๐ŸŽฏ Why Learn Algorithms from Scratch?

Machine learning libraries like scikit-learn are powerful, but they hide the inner workings. By implementing algorithms from scratch, you will:

  • Understand the Math: See how mathematical formulas translate into code
  • Debug with Confidence: Know what's happening under the hood when things go wrong
  • Optimize Better: Make informed decisions about hyperparameters and model selection
  • Interview Ready: Demonstrate deep understanding in technical interviews
  • Build Intuition: Develop a mental model of how algorithms behave

Learning vs. Production

โš ๏ธ Important Note: These implementations prioritize clarity and education over performance. For production use, always use optimized libraries like scikit-learn, TensorFlow, or PyTorch.

โœจ Key Features

  • ๐Ÿ“– Comprehensive Documentation: Each algorithm includes detailed markdown files explaining concepts, math, and implementation
  • ๐Ÿ’ก Step-by-Step Examples: Real-world use cases with complete code examples
  • ๐Ÿงฎ Mathematical Foundations: Equations explained in plain language
  • ๐Ÿ“Š Visual Learning: Code examples that can be easily visualized
  • ๐Ÿ”ง Production-Like Code: Clean, well-documented, reusable classes
  • ๐ŸŽ“ Educational Focus: Comments and explanations at every important step

๐Ÿ“ฆ Repository Structure

ML-Algorithms-from-scratch/
โ”‚
โ”œโ”€โ”€ 1. Linear Regression/
โ”‚   โ”œโ”€โ”€ _1_linear_regression.md      # Comprehensive guide
โ”‚   โ””โ”€โ”€ _1_linear_regressions.py     # Implementation
โ”‚
โ”œโ”€โ”€ 2. Multiple Regression/
โ”‚   โ”œโ”€โ”€ _2_multiple_regression.md    # Comprehensive guide
โ”‚   โ””โ”€โ”€ _2_multiple_regression.py    # Implementation
โ”‚
โ”œโ”€โ”€ 3. Ridge Regression/
โ”‚   โ”œโ”€โ”€ _3_ridge_regression.md       # Comprehensive guide
โ”‚   โ””โ”€โ”€ _3_ridge_regression.py       # Implementation
โ”‚
โ”œโ”€โ”€ 4. Logistic Regression/
โ”‚   โ”œโ”€โ”€ _4_logistic_regression.md    # Comprehensive guide
โ”‚   โ””โ”€โ”€ _4_logistic_regression.py    # Implementation
โ”‚
โ”œโ”€โ”€ 5. KNN/
โ”‚   โ”œโ”€โ”€ _5_knn.md                    # Comprehensive guide
โ”‚   โ””โ”€โ”€ _5_knn.py                    # Implementation
โ”‚
โ”œโ”€โ”€ 6. Decision Trees/
โ”‚   โ”œโ”€โ”€ _6_decision_trees.md         # Comprehensive guide
โ”‚   โ””โ”€โ”€ _6_decision_trees.py         # Implementation
โ”‚
โ”œโ”€โ”€ 7. Random Forests/
โ”‚   โ”œโ”€โ”€ _7_random_forests.md         # Comprehensive guide
โ”‚   โ””โ”€โ”€ _7_random_forests.py         # Implementation
โ”‚
โ”œโ”€โ”€ 8. SVM/
โ”‚   โ”œโ”€โ”€ _8_svm.md                    # Comprehensive guide
โ”‚   โ””โ”€โ”€ _8_svm.py                    # Implementation
โ”‚
โ”œโ”€โ”€ 9. Naive Bayes/
โ”‚   โ”œโ”€โ”€ _9_naive_bayes.md            # Comprehensive guide
โ”‚   โ””โ”€โ”€ _9_naive_bayes.py            # Implementation
โ”‚
โ”œโ”€โ”€ 10. k-Means Clustering/
โ”‚   โ”œโ”€โ”€ _10_kmeans_clustering.md     # Comprehensive guide
โ”‚   โ””โ”€โ”€ _10_kmeans_clustering.py     # Implementation
โ”‚
โ””โ”€โ”€ README.md                         # You are here!

Each algorithm folder contains:

  • .py file: Clean, documented implementation with usage examples
  • .md file: Detailed explanation with theory, math, and walkthroughs

Algorithms Included

# Algorithm Status Documentation
1 Linear Regression โœ… Implemented View Details
2 Multiple Regression โœ… Implemented View Details
3 Ridge Regression โœ… Implemented View Details
4 Logistic Regression โœ… Implemented View Details
5 K-Nearest Neighbors (KNN) โœ… Implemented View Details
6 Decision Trees โœ… Implemented View Details
7 Random Forests โœ… Implemented View Details
8 Support Vector Machines (SVM) โœ… Implemented View Details
9 Naive Bayes โœ… Implemented View Details
10 k-Means Clustering โœ… Implemented View Details
11 Principal Component Analysis (PCA) ๐Ÿ”œ Coming Soon -
12 Hierarchical Clustering ๐Ÿ”œ Coming Soon -
13 Apriori Algorithm (Association Rule Mining) ๐Ÿ”œ Coming Soon -
14 t-Distributed Stochastic Neighbor Embedding (t-SNE) ๐Ÿ”œ Coming Soon -
15 Decision Tree ID3 (Feature Selection) ๐Ÿ”œ Coming Soon -
16 AdaBoost ๐Ÿ”œ Coming Soon -
17 Gradient Boosting ๐Ÿ”œ Coming Soon -
18 Xtreme Gradient Boosting (XGB) ๐Ÿ”œ Coming Soon -

๐Ÿš€ Getting Started

Prerequisites

Before you begin, ensure you have:

  • Python 3.7 or higher installed on your system
  • NumPy library (for numerical computations)
  • Optional: matplotlib (for visualizations), scikit-learn (for comparison and datasets)

Installation

  1. Clone the repository:
git clone https://github.com/inboxpraveen/ML-Algorithms-from-scratch.git
cd ML-Algorithms-from-scratch
  1. Install required dependencies:
pip install numpy

# Optional: Install these for running examples and visualizations
pip install matplotlib scikit-learn

Quick Start

All algorithms in this repository follow a consistent, simple interface:

  1. Import the algorithm class from its folder
  2. Create an instance of the class
  3. Train the model using .fit(X_train, y_train)
  4. Predict on new data using .predict(X_test)
  5. Evaluate performance using .score(X_test, y_test) (where available)

Each algorithm folder contains complete code examples in both the .py and .md files showing exactly how to use that specific algorithm with real data.

How to Use This Repository

  1. Browse the Algorithms Table below to find an algorithm
  2. Read the Documentation (click "View Details") to understand the theory
  3. Study the Code in the .py file - it's heavily commented
  4. Run the Examples provided in the usage section of each file
  5. Experiment - modify parameters, try your own data!

Learning Path

Recommended order for beginners:

  1. Start with Linear Regression - Simplest algorithm, foundation for others
  2. Move to Multiple Regression - Understand multiple features
  3. Try Classification - Logistic Regression (coming soon)
  4. Explore Non-linear - Decision Trees, KNN (coming soon)

Each algorithm builds on concepts from previous ones!

๐ŸŽ“ What You'll Learn

For each algorithm, you'll understand:

  • The Problem It Solves: When and why to use this algorithm
  • Mathematical Foundation: The equations and theory behind it
  • Step-by-Step Implementation: How math translates to code
  • Practical Applications: Real-world use cases
  • Model Evaluation: How to measure performance
  • Advantages & Limitations: When to use (or not use) the algorithm

๐Ÿ“– Documentation Quality

Each algorithm includes:

  • Comprehensive Guide (.md file):

    • Intuitive explanations with real-world analogies
    • Mathematical formulas broken down step-by-step
    • Implementation details explained
    • Complete examples with output
    • Visualization suggestions
    • Links to further resources
  • Clean Implementation (.py file):

    • Class-based design for reusability
    • Detailed docstrings for all methods
    • Inline comments explaining key steps
    • Multiple usage examples
    • Type hints and parameter documentation

๐Ÿค Contributing

Contributions are welcome and appreciated! Here's how you can help:

Ways to Contribute

  • ๐Ÿ› Report Bugs: Open an issue if you find a bug
  • ๐Ÿ’ก Suggest Algorithms: Request algorithms you'd like to see
  • ๐Ÿ“ Improve Documentation: Fix typos, add clarity, include examples
  • ๐Ÿ”ง Enhance Code: Optimize implementations (while keeping clarity)
  • โœ… Add Tests: Help ensure correctness
  • ๐ŸŽจ Create Visualizations: Add plots and diagrams

Contribution Guidelines

When contributing, please:

  1. Follow the existing code style (clean, well-documented, educational)
  2. Include comprehensive docstrings and comments
  3. Add usage examples in the code
  4. Update or create corresponding .md documentation
  5. Ensure code works with NumPy only (no additional ML libraries for core implementation)
  6. Test your implementation with example datasets

Note: The goal is education, not performance. Prioritize clarity over optimization.

โ“ Frequently Asked Questions

Q: Should I use this code in production?
A: No, these implementations prioritize learning over performance. Use scikit-learn, TensorFlow, or PyTorch for production.

Q: Do I need to know advanced math?
A: Basic knowledge helps, but each algorithm includes math explanations in plain language.

Q: Can I compare these with scikit-learn?
A: Absolutely! Many examples show how to use scikit-learn for comparison and validation.

Q: Why NumPy only?
A: To focus on fundamentals. Understanding NumPy operations helps you understand what libraries do internally.

Q: How long does it take to learn each algorithm?
A: With the documentation provided, expect 1-2 hours per algorithm for thorough understanding.

๐Ÿ“š Additional Resources

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

This repository is built for the community of learners who believe in understanding fundamentals. Special thanks to all contributors and the open-source community.

โญ Support This Project

If you find this repository helpful:

  • โญ Star this repository to help others discover it
  • ๐Ÿ”„ Share it with fellow learners
  • ๐Ÿค Contribute an algorithm or improvement
  • ๐Ÿ“ข Provide feedback through issues

๐Ÿ’ฌ Final Thoughts

"Learning machine learning from scratch is like learning to cook from scratch - you could just buy premade meals (use libraries), but understanding ingredients and techniques (algorithms and math) makes you a better chef (data scientist)!"

Understanding the core concepts of machine learning algorithms is essential for anyone looking to excel in the field of data science and artificial intelligence. This repository aims to provide a comprehensive and accessible resource for learning and experimenting with various machine learning algorithms from scratch.

Happy Learning and Coding! ๐Ÿš€๐Ÿ“Š๐Ÿค–


Maintained by @inboxpraveen | Last Updated: December 2025

About

This GitHub repository contains Python implementations of the top 25 machine learning algorithms from scratch using only the Python programming language and the NumPy library. The goal of this repository is to help beginners and aspiring data scientists understand the inner workings of these algorithms by providing transparent and readable code.

Topics

Resources

License

Stars

Watchers

Forks

Languages