12 min read
Machine Learning Basics
A high-level overview of ML concepts, algorithms, and the Scikit-Learn workflow
What You'll Learn
- What is Machine Learning?
- The ML Workflow
- Key Algorithms (Decision Trees, Random Forests, K-Means)
- Model Evaluation
- Bias and Variance
What is Machine Learning?
Machine Learning (ML) is the field of study that gives computers the ability to learn without being explicitly programmed.
Traditional Programming: Input + Rules -> Output
Machine Learning: Input + Output -> Rules
The ML Workflow
- Data Collection: Get the data.
- Data Preparation: Clean, format, and split data.
- Model Training: Feed data to the algorithm.
- Model Evaluation: Test how well it learned.
- Tuning: Adjust settings (hyperparameters) to improve.
- Prediction: Use the model on new data.
Key Algorithms
1. Decision Trees (Supervised):
- Like a flowchart of "if-then" rules.
- Easy to interpret.
- Prone to overfitting.
2. Random Forests (Supervised):
- A collection (ensemble) of many decision trees.
- More accurate and robust than single trees.
- "Wisdom of the crowd."
3. K-Means Clustering (Unsupervised):
- Groups data points into K clusters based on similarity.
- Used for customer segmentation.
Model Evaluation
How do we know if our model is good?
For Regression (Numbers):
- MAE (Mean Absolute Error): Average error.
- RMSE (Root Mean Squared Error): Penalizes large errors more.
For Classification (Categories):
- Accuracy: % correct.
- Precision: Of those predicted positive, how many were actually positive?
- Recall: Of those actually positive, how many did we find?
- F1 Score: Harmonic mean of Precision and Recall.
Bias vs Variance Trade-off
- Bias: Error from erroneous assumptions (Underfitting).
- Variance: Error from sensitivity to small fluctuations in the training set (Overfitting).
- Goal: Find the sweet spot (low bias, low variance).
Next Steps
Ready for the final challenge? Let's apply everything you've learned in the Mini-Project!
Practice & Experiment
Test your understanding by running Python code directly in your browser. Try the examples from the article above!