Learn machine learning by understanding the math and code behind the algorithms!
This repository contains clear, educational implementations of essential machine learning algorithms built from scratch using only Python and NumPy. Each algorithm includes comprehensive documentation, mathematical explanations, and practical examples.
Perfect for:
- ๐ Students learning machine learning fundamentals
- ๐จโ๐ป Developers wanting to understand algorithms deeply
- ๐ Data scientists preparing for technical interviews
- ๐ฌ Anyone curious about how ML algorithms actually work
Machine learning libraries like scikit-learn are powerful, but they hide the inner workings. By implementing algorithms from scratch, you will:
- Understand the Math: See how mathematical formulas translate into code
- Debug with Confidence: Know what's happening under the hood when things go wrong
- Optimize Better: Make informed decisions about hyperparameters and model selection
- Interview Ready: Demonstrate deep understanding in technical interviews
- Build Intuition: Develop a mental model of how algorithms behave
- ๐ Comprehensive Documentation: Each algorithm includes detailed markdown files explaining concepts, math, and implementation
- ๐ก Step-by-Step Examples: Real-world use cases with complete code examples
- ๐งฎ Mathematical Foundations: Equations explained in plain language
- ๐ Visual Learning: Code examples that can be easily visualized
- ๐ง Production-Like Code: Clean, well-documented, reusable classes
- ๐ Educational Focus: Comments and explanations at every important step
ML-Algorithms-from-scratch/
โ
โโโ 1. Linear Regression/
โ โโโ _1_linear_regression.md # Comprehensive guide
โ โโโ _1_linear_regressions.py # Implementation
โ
โโโ 2. Multiple Regression/
โ โโโ _2_multiple_regression.md # Comprehensive guide
โ โโโ _2_multiple_regression.py # Implementation
โ
โโโ 3. Ridge Regression/
โ โโโ _3_ridge_regression.md # Comprehensive guide
โ โโโ _3_ridge_regression.py # Implementation
โ
โโโ 4. Logistic Regression/
โ โโโ _4_logistic_regression.md # Comprehensive guide
โ โโโ _4_logistic_regression.py # Implementation
โ
โโโ 5. KNN/
โ โโโ _5_knn.md # Comprehensive guide
โ โโโ _5_knn.py # Implementation
โ
โโโ 6. Decision Trees/
โ โโโ _6_decision_trees.md # Comprehensive guide
โ โโโ _6_decision_trees.py # Implementation
โ
โโโ 7. Random Forests/
โ โโโ _7_random_forests.md # Comprehensive guide
โ โโโ _7_random_forests.py # Implementation
โ
โโโ 8. SVM/
โ โโโ _8_svm.md # Comprehensive guide
โ โโโ _8_svm.py # Implementation
โ
โโโ 9. Naive Bayes/
โ โโโ _9_naive_bayes.md # Comprehensive guide
โ โโโ _9_naive_bayes.py # Implementation
โ
โโโ 10. k-Means Clustering/
โ โโโ _10_kmeans_clustering.md # Comprehensive guide
โ โโโ _10_kmeans_clustering.py # Implementation
โ
โโโ README.md # You are here!
Each algorithm folder contains:
.pyfile: Clean, documented implementation with usage examples.mdfile: Detailed explanation with theory, math, and walkthroughs
| # | Algorithm | Status | Documentation |
|---|---|---|---|
| 1 | Linear Regression | โ Implemented | View Details |
| 2 | Multiple Regression | โ Implemented | View Details |
| 3 | Ridge Regression | โ Implemented | View Details |
| 4 | Logistic Regression | โ Implemented | View Details |
| 5 | K-Nearest Neighbors (KNN) | โ Implemented | View Details |
| 6 | Decision Trees | โ Implemented | View Details |
| 7 | Random Forests | โ Implemented | View Details |
| 8 | Support Vector Machines (SVM) | โ Implemented | View Details |
| 9 | Naive Bayes | โ Implemented | View Details |
| 10 | k-Means Clustering | โ Implemented | View Details |
| 11 | Principal Component Analysis (PCA) | ๐ Coming Soon | - |
| 12 | Hierarchical Clustering | ๐ Coming Soon | - |
| 13 | Apriori Algorithm (Association Rule Mining) | ๐ Coming Soon | - |
| 14 | t-Distributed Stochastic Neighbor Embedding (t-SNE) | ๐ Coming Soon | - |
| 15 | Decision Tree ID3 (Feature Selection) | ๐ Coming Soon | - |
| 16 | AdaBoost | ๐ Coming Soon | - |
| 17 | Gradient Boosting | ๐ Coming Soon | - |
| 18 | Xtreme Gradient Boosting (XGB) | ๐ Coming Soon | - |
Before you begin, ensure you have:
- Python 3.7 or higher installed on your system
- NumPy library (for numerical computations)
- Optional: matplotlib (for visualizations), scikit-learn (for comparison and datasets)
- Clone the repository:
git clone https://github.com/inboxpraveen/ML-Algorithms-from-scratch.git
cd ML-Algorithms-from-scratch- Install required dependencies:
pip install numpy
# Optional: Install these for running examples and visualizations
pip install matplotlib scikit-learnAll algorithms in this repository follow a consistent, simple interface:
- Import the algorithm class from its folder
- Create an instance of the class
- Train the model using
.fit(X_train, y_train) - Predict on new data using
.predict(X_test) - Evaluate performance using
.score(X_test, y_test)(where available)
Each algorithm folder contains complete code examples in both the .py and .md files showing exactly how to use that specific algorithm with real data.
- Browse the Algorithms Table below to find an algorithm
- Read the Documentation (click "View Details") to understand the theory
- Study the Code in the
.pyfile - it's heavily commented - Run the Examples provided in the usage section of each file
- Experiment - modify parameters, try your own data!
Recommended order for beginners:
- Start with Linear Regression - Simplest algorithm, foundation for others
- Move to Multiple Regression - Understand multiple features
- Try Classification - Logistic Regression (coming soon)
- Explore Non-linear - Decision Trees, KNN (coming soon)
Each algorithm builds on concepts from previous ones!
For each algorithm, you'll understand:
- The Problem It Solves: When and why to use this algorithm
- Mathematical Foundation: The equations and theory behind it
- Step-by-Step Implementation: How math translates to code
- Practical Applications: Real-world use cases
- Model Evaluation: How to measure performance
- Advantages & Limitations: When to use (or not use) the algorithm
Each algorithm includes:
-
Comprehensive Guide (
.mdfile):- Intuitive explanations with real-world analogies
- Mathematical formulas broken down step-by-step
- Implementation details explained
- Complete examples with output
- Visualization suggestions
- Links to further resources
-
Clean Implementation (
.pyfile):- Class-based design for reusability
- Detailed docstrings for all methods
- Inline comments explaining key steps
- Multiple usage examples
- Type hints and parameter documentation
Contributions are welcome and appreciated! Here's how you can help:
- ๐ Report Bugs: Open an issue if you find a bug
- ๐ก Suggest Algorithms: Request algorithms you'd like to see
- ๐ Improve Documentation: Fix typos, add clarity, include examples
- ๐ง Enhance Code: Optimize implementations (while keeping clarity)
- โ Add Tests: Help ensure correctness
- ๐จ Create Visualizations: Add plots and diagrams
When contributing, please:
- Follow the existing code style (clean, well-documented, educational)
- Include comprehensive docstrings and comments
- Add usage examples in the code
- Update or create corresponding
.mddocumentation - Ensure code works with NumPy only (no additional ML libraries for core implementation)
- Test your implementation with example datasets
Note: The goal is education, not performance. Prioritize clarity over optimization.
Q: Should I use this code in production?
A: No, these implementations prioritize learning over performance. Use scikit-learn, TensorFlow, or PyTorch for production.
Q: Do I need to know advanced math?
A: Basic knowledge helps, but each algorithm includes math explanations in plain language.
Q: Can I compare these with scikit-learn?
A: Absolutely! Many examples show how to use scikit-learn for comparison and validation.
Q: Why NumPy only?
A: To focus on fundamentals. Understanding NumPy operations helps you understand what libraries do internally.
Q: How long does it take to learn each algorithm?
A: With the documentation provided, expect 1-2 hours per algorithm for thorough understanding.
- NumPy Documentation
- Scikit-learn Documentation
- Machine Learning Coursera (Andrew Ng)
- Deep Learning Book
- StatQuest YouTube Channel - Great visual explanations
This project is licensed under the MIT License - see the LICENSE file for details.
This repository is built for the community of learners who believe in understanding fundamentals. Special thanks to all contributors and the open-source community.
If you find this repository helpful:
- โญ Star this repository to help others discover it
- ๐ Share it with fellow learners
- ๐ค Contribute an algorithm or improvement
- ๐ข Provide feedback through issues
"Learning machine learning from scratch is like learning to cook from scratch - you could just buy premade meals (use libraries), but understanding ingredients and techniques (algorithms and math) makes you a better chef (data scientist)!"
Understanding the core concepts of machine learning algorithms is essential for anyone looking to excel in the field of data science and artificial intelligence. This repository aims to provide a comprehensive and accessible resource for learning and experimenting with various machine learning algorithms from scratch.
Happy Learning and Coding! ๐๐๐ค
Maintained by @inboxpraveen | Last Updated: December 2025