Machine Learning Algorithms from Scratch

Learn machine learning by understanding the math and code behind the algorithms!

📚 About This Repository

This repository contains clear, educational implementations of essential machine learning algorithms built from scratch using only Python and NumPy. Each algorithm includes comprehensive documentation, mathematical explanations, and practical examples.

Perfect for:

🎓 Students learning machine learning fundamentals
👨‍💻 Developers wanting to understand algorithms deeply
📊 Data scientists preparing for technical interviews
🔬 Anyone curious about how ML algorithms actually work

🎯 Why Learn Algorithms from Scratch?

Machine learning libraries like scikit-learn are powerful, but they hide the inner workings. By implementing algorithms from scratch, you will:

Understand the Math: See how mathematical formulas translate into code
Debug with Confidence: Know what's happening under the hood when things go wrong
Optimize Better: Make informed decisions about hyperparameters and model selection
Interview Ready: Demonstrate deep understanding in technical interviews
Build Intuition: Develop a mental model of how algorithms behave

Learning vs. Production

⚠️ Important Note: These implementations prioritize clarity and education over performance. For production use, always use optimized libraries like scikit-learn, TensorFlow, or PyTorch.

✨ Key Features

📖 Comprehensive Documentation: Each algorithm includes detailed markdown files explaining concepts, math, and implementation
💡 Step-by-Step Examples: Real-world use cases with complete code examples
🧮 Mathematical Foundations: Equations explained in plain language
📊 Visual Learning: Code examples that can be easily visualized
🔧 Production-Like Code: Clean, well-documented, reusable classes
🎓 Educational Focus: Comments and explanations at every important step

📦 Repository Structure

ML-Algorithms-from-scratch/
│
├── 1. Linear Regression/
│   ├── _1_linear_regression.md      # Comprehensive guide
│   └── _1_linear_regressions.py     # Implementation
│
├── 2. Multiple Regression/
│   ├── _2_multiple_regression.md    # Comprehensive guide
│   └── _2_multiple_regression.py    # Implementation
│
├── 3. Ridge Regression/
│   ├── _3_ridge_regression.md       # Comprehensive guide
│   └── _3_ridge_regression.py       # Implementation
│
├── 4. Logistic Regression/
│   ├── _4_logistic_regression.md    # Comprehensive guide
│   └── _4_logistic_regression.py    # Implementation
│
├── 5. KNN/
│   ├── _5_knn.md                    # Comprehensive guide
│   └── _5_knn.py                    # Implementation
│
├── 6. Decision Trees/
│   ├── _6_decision_trees.md         # Comprehensive guide
│   └── _6_decision_trees.py         # Implementation
│
├── 7. Random Forests/
│   ├── _7_random_forests.md         # Comprehensive guide
│   └── _7_random_forests.py         # Implementation
│
├── 8. SVM/
│   ├── _8_svm.md                    # Comprehensive guide
│   └── _8_svm.py                    # Implementation
│
├── 9. Naive Bayes/
│   ├── _9_naive_bayes.md            # Comprehensive guide
│   └── _9_naive_bayes.py            # Implementation
│
├── 10. k-Means Clustering/
│   ├── _10_kmeans_clustering.md     # Comprehensive guide
│   └── _10_kmeans_clustering.py     # Implementation
│
└── README.md                         # You are here!

Each algorithm folder contains:

.py file: Clean, documented implementation with usage examples
.md file: Detailed explanation with theory, math, and walkthroughs

Algorithms Included

#	Algorithm	Status	Documentation
1	Linear Regression	✅ Implemented	View Details
2	Multiple Regression	✅ Implemented	View Details
3	Ridge Regression	✅ Implemented	View Details
4	Logistic Regression	✅ Implemented	View Details
5	K-Nearest Neighbors (KNN)	✅ Implemented	View Details
6	Decision Trees	✅ Implemented	View Details
7	Random Forests	✅ Implemented	View Details
8	Support Vector Machines (SVM)	✅ Implemented	View Details
9	Naive Bayes	✅ Implemented	View Details
10	k-Means Clustering	✅ Implemented	View Details
11	Principal Component Analysis (PCA)	🔜 Coming Soon	-
12	Hierarchical Clustering	🔜 Coming Soon	-
13	Apriori Algorithm (Association Rule Mining)	🔜 Coming Soon	-
14	t-Distributed Stochastic Neighbor Embedding (t-SNE)	🔜 Coming Soon	-
15	Decision Tree ID3 (Feature Selection)	🔜 Coming Soon	-
16	AdaBoost	🔜 Coming Soon	-
17	Gradient Boosting	🔜 Coming Soon	-
18	Xtreme Gradient Boosting (XGB)	🔜 Coming Soon	-

🚀 Getting Started

Prerequisites

Before you begin, ensure you have:

Python 3.7 or higher installed on your system
NumPy library (for numerical computations)
Optional: matplotlib (for visualizations), scikit-learn (for comparison and datasets)

Installation

Clone the repository:

git clone https://github.com/inboxpraveen/ML-Algorithms-from-scratch.git
cd ML-Algorithms-from-scratch

Install required dependencies:

pip install numpy

# Optional: Install these for running examples and visualizations
pip install matplotlib scikit-learn

Quick Start

All algorithms in this repository follow a consistent, simple interface:

Import the algorithm class from its folder
Create an instance of the class
Train the model using .fit(X_train, y_train)
Predict on new data using .predict(X_test)
Evaluate performance using .score(X_test, y_test) (where available)

Each algorithm folder contains complete code examples in both the .py and .md files showing exactly how to use that specific algorithm with real data.

How to Use This Repository

Browse the Algorithms Table below to find an algorithm
Read the Documentation (click "View Details") to understand the theory
Study the Code in the .py file - it's heavily commented
Run the Examples provided in the usage section of each file
Experiment - modify parameters, try your own data!

Learning Path

Recommended order for beginners:

Start with Linear Regression - Simplest algorithm, foundation for others
Move to Multiple Regression - Understand multiple features
Try Classification - Logistic Regression (coming soon)
Explore Non-linear - Decision Trees, KNN (coming soon)

Each algorithm builds on concepts from previous ones!

🎓 What You'll Learn

For each algorithm, you'll understand:

The Problem It Solves: When and why to use this algorithm
Mathematical Foundation: The equations and theory behind it
Step-by-Step Implementation: How math translates to code
Practical Applications: Real-world use cases
Model Evaluation: How to measure performance
Advantages & Limitations: When to use (or not use) the algorithm

📖 Documentation Quality

Each algorithm includes:

Comprehensive Guide (.md file):
- Intuitive explanations with real-world analogies
- Mathematical formulas broken down step-by-step
- Implementation details explained
- Complete examples with output
- Visualization suggestions
- Links to further resources
Clean Implementation (.py file):
- Class-based design for reusability
- Detailed docstrings for all methods
- Inline comments explaining key steps
- Multiple usage examples
- Type hints and parameter documentation

🤝 Contributing

Contributions are welcome and appreciated! Here's how you can help:

Ways to Contribute

🐛 Report Bugs: Open an issue if you find a bug
💡 Suggest Algorithms: Request algorithms you'd like to see
📝 Improve Documentation: Fix typos, add clarity, include examples
🔧 Enhance Code: Optimize implementations (while keeping clarity)
✅ Add Tests: Help ensure correctness
🎨 Create Visualizations: Add plots and diagrams

Contribution Guidelines

When contributing, please:

Follow the existing code style (clean, well-documented, educational)
Include comprehensive docstrings and comments
Add usage examples in the code
Update or create corresponding .md documentation
Ensure code works with NumPy only (no additional ML libraries for core implementation)
Test your implementation with example datasets

Note: The goal is education, not performance. Prioritize clarity over optimization.

❓ Frequently Asked Questions

Q: Should I use this code in production?
A: No, these implementations prioritize learning over performance. Use scikit-learn, TensorFlow, or PyTorch for production.

Q: Do I need to know advanced math?
A: Basic knowledge helps, but each algorithm includes math explanations in plain language.

Q: Can I compare these with scikit-learn?
A: Absolutely! Many examples show how to use scikit-learn for comparison and validation.

Q: Why NumPy only?
A: To focus on fundamentals. Understanding NumPy operations helps you understand what libraries do internally.

Q: How long does it take to learn each algorithm?
A: With the documentation provided, expect 1-2 hours per algorithm for thorough understanding.

📚 Additional Resources

NumPy Documentation
Scikit-learn Documentation
Machine Learning Coursera (Andrew Ng)
Deep Learning Book
StatQuest YouTube Channel - Great visual explanations

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

This repository is built for the community of learners who believe in understanding fundamentals. Special thanks to all contributors and the open-source community.

⭐ Support This Project

If you find this repository helpful:

⭐ Star this repository to help others discover it
🔄 Share it with fellow learners
🤝 Contribute an algorithm or improvement
📢 Provide feedback through issues

💬 Final Thoughts

"Learning machine learning from scratch is like learning to cook from scratch - you could just buy premade meals (use libraries), but understanding ingredients and techniques (algorithms and math) makes you a better chef (data scientist)!"

Understanding the core concepts of machine learning algorithms is essential for anyone looking to excel in the field of data science and artificial intelligence. This repository aims to provide a comprehensive and accessible resource for learning and experimenting with various machine learning algorithms from scratch.

Happy Learning and Coding! 🚀📊🤖

Maintained by @inboxpraveen | Last Updated: December 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Algorithms from Scratch

📚 About This Repository

🎯 Why Learn Algorithms from Scratch?

Learning vs. Production

✨ Key Features

📦 Repository Structure

Algorithms Included

🚀 Getting Started

Prerequisites

Installation

Quick Start

How to Use This Repository

Learning Path

🎓 What You'll Learn

📖 Documentation Quality

🤝 Contributing

Ways to Contribute

Contribution Guidelines

❓ Frequently Asked Questions

📚 Additional Resources

📄 License

🙏 Acknowledgments

⭐ Support This Project

💬 Final Thoughts

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
1. Linear Regression		1. Linear Regression
10. k-Means Clustering		10. k-Means Clustering
2. Multiple Regression		2. Multiple Regression
3. Ridge Regression		3. Ridge Regression
4. Logistic Regression		4. Logistic Regression
5. KNN		5. KNN
6. Decision Trees		6. Decision Trees
7. Random Forests		7. Random Forests
8. SVM		8. SVM
9. Naive Bayes		9. Naive Bayes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

inboxpraveen/ML-Algorithms-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Algorithms from Scratch

📚 About This Repository

🎯 Why Learn Algorithms from Scratch?

Learning vs. Production

✨ Key Features

📦 Repository Structure

Algorithms Included

🚀 Getting Started

Prerequisites

Installation

Quick Start

How to Use This Repository

Learning Path

🎓 What You'll Learn

📖 Documentation Quality

🤝 Contributing

Ways to Contribute

Contribution Guidelines

❓ Frequently Asked Questions

📚 Additional Resources

📄 License

🙏 Acknowledgments

⭐ Support This Project

💬 Final Thoughts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages