5 min read min read
Introduction to Machine Learning
Learn what Machine Learning is and how it works
Introduction to Machine Learning
What is Machine Learning?
Machine Learning (ML) is teaching computers to learn from data.
Instead of programming explicit rules, we show examples and the computer figures out patterns.
Traditional Programming vs ML
Traditional:
Rules + Data → Answer
Machine Learning:
Data + Answers → Rules (Model)
Types of Machine Learning
1. Supervised Learning
Learn from labeled data (we know the answers):
-
Classification: Predict categories
- Is this email spam or not?
- Is this tumor benign or malignant?
-
Regression: Predict numbers
- What will the house price be?
- How many sales next month?
2. Unsupervised Learning
Find patterns in unlabeled data:
- Clustering: Group similar items
- Customer segments
- Similar documents
3. Reinforcement Learning
Learn by trial and error:
- Game playing AI
- Self-driving cars
The ML Workflow
1. Collect Data
2. Prepare Data (clean, transform)
3. Split Data (train/test)
4. Choose Model
5. Train Model
6. Evaluate Model
7. Improve & Repeat
Scikit-Learn Basics
The most popular ML library in Python:
code.py
# Install: pip install scikit-learn
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np
# Sample data
X = np.array([[1], [2], [3], [4], [5]]) # Features
y = np.array([2, 4, 6, 8, 10]) # Target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict
predictions = model.predict(X_test)
print(predictions)Key ML Concepts
Features (X)
The input data used to make predictions:
- Age, income, location (for loan approval)
- Pixels (for image classification)
- Words (for text classification)
Target (y)
What we want to predict:
- Loan approved/rejected
- Cat/dog
- Spam/not spam
Training
Showing the model examples so it learns patterns:
code.py
model.fit(X_train, y_train)Prediction
Using the trained model on new data:
code.py
predictions = model.predict(X_new)Simple Classification Example
code.py
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np
# Simple dataset: predict if someone buys product
# Features: age, income (in thousands)
X = np.array([
[25, 40], [30, 50], [35, 60], [40, 70],
[45, 80], [50, 90], [22, 30], [28, 35],
[55, 95], [60, 100]
])
# Target: 1 = bought, 0 = didn't buy
y = np.array([0, 0, 1, 1, 1, 1, 0, 0, 1, 1])
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Train
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
# Predict
predictions = model.predict(X_test)
print(f"Predictions: {predictions}")
print(f"Actual: {y_test}")
# Accuracy
accuracy = model.score(X_test, y_test)
print(f"Accuracy: {accuracy:.0%}")Common ML Algorithms
| Algorithm | Type | Use Case |
|---|---|---|
| Linear Regression | Regression | Price prediction |
| Logistic Regression | Classification | Yes/No decisions |
| Decision Tree | Both | Easy to interpret |
| Random Forest | Both | High accuracy |
| KNN | Both | Simple, no training |
| SVM | Both | Complex boundaries |
Overfitting vs Underfitting
Overfitting
- Model learns training data too well
- Memorizes instead of generalizing
- Poor on new data
Underfitting
- Model is too simple
- Doesn't capture patterns
- Poor on all data
Goal: Find the right balance!
Complete Example
code.py
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
# Load famous iris dataset
iris = load_iris()
X = iris.data
y = iris.target
print(f"Features: {iris.feature_names}")
print(f"Classes: {iris.target_names}")
print(f"Data shape: {X.shape}")
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = DecisionTreeClassifier(max_depth=3)
model.fit(X_train, y_train)
# Evaluate
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)
print(f"\nTraining accuracy: {train_acc:.0%}")
print(f"Test accuracy: {test_acc:.0%}")Key Points
- ML learns patterns from data
- Supervised: Has labels (classification, regression)
- Unsupervised: No labels (clustering)
- Split data into train and test sets
- Use scikit-learn for ML in Python
- Watch out for overfitting
What's Next?
Learn how to properly split data for training and testing.