Machine Learning — A Visual Guide with Examples & Code

What is Machine Learning?

In normal programming you write the rules and the computer produces the answer. In machine learning you hand the computer many examples — data plus the correct answers — and it discovers the rules by itself.

Programming gives the computer rules; ML gives it examples and lets it find the rules.

Example To predict house prices, you don't write a price formula. You show the model thousands of houses (size, location, rooms) with their real prices, and it learns the relationship — then prices a brand-new house.

# install once
pip install scikit-learn pandas numpy matplotlib

Types of ML

Three families, split by whether your data already contains the correct answers.

Type	Has answers?	What it does	Example
Supervised	Yes	Predicts a value or category	House price · spam filter
Unsupervised	No	Finds hidden groups / patterns	Customer segments
Reinforcement	Reward / penalty	Learns by trial and error	Game-playing · robotics

Inside Supervised there are two jobs: Regression (the answer is a number, like a price) and Classification (the answer is a category, like spam / not spam).

Regression fits a trend to predict numbers; classification finds a boundary to separate categories.

How every model works: `fit` & `predict`

The single most useful thing here: every scikit-learn model uses the same three methods. Learn them once and you can drive all of them.

The universal scikit-learn workflow — true for linear models, trees, SVMs, everything.

model.fit(X_train, y_train)   # 1) learn from data
model.predict(X_test)         # 2) make predictions
model.score(X_test, y_test)   # 3) check how good it is

X = the inputs (features that describe the thing). y = the answer you want to predict (the target).

Train / Test split

You must test the model on data it has never seen, to prove it actually learned instead of just memorizing. So we hold out a slice.

Most data trains the model; the held-out test set is the “exam” on unseen questions.

Example Like exam prep: you practice on some questions (training), but the real exam uses different questions (testing) — that's the only way to prove you understood, not memorized.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,      # 20% held out for testing
    random_state=42,    # makes the split repeatable
    stratify=y          # keep class balance (classification)
)

Preparing data

Scaling — put every column on the same footing

If one column is in the thousands and another in single digits, the model is fooled into thinking the big-number column matters more. Scaling fixes that.

Income (≈50,000) would dwarf age (≈35) until scaling brings them onto one comparable range.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # learns mean/std from TRAIN only
X_test_scaled  = scaler.transform(X_test)       # just applies it

Golden rule The scaler learns from training data only (fit_transform), then merely applies (transform) to the test data. Fit it on everything and you've secretly cheated.

Encoding — turn text into numbers

Models only understand numbers, so a city column of "Cairo / Giza / Alex" becomes three 0/1 columns.

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder(handle_unknown="ignore")

Supervised models

Eight workhorses. Each card: the idea, a real example, a diagram, and code.

6.1 · Linear Regression

Regression

The idea. Draws the best straight line through your data to describe a relationship and predict numbers.

Example Bigger house → higher price. The line learns "for every extra square meter, add about X to the price."

The fitted line lets you read off the predicted price for any new house size.

from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([[50],[60],[80],[100],[120],[150]])  # size m²
y = np.array([500,600,780,1000,1150,1450])         # price (k)

model = LinearRegression().fit(X, y)
print("130 m² →", model.predict([[130]])[0])

6.2 · Logistic Regression

Classification

The idea. Despite the name, it's for classification. It outputs the probability of belonging to a class, using an S-shaped curve.

Example Will a customer buy? It outputs "80% likely" → predicts "buy." Anything above the 0.5 line tips to yes.

The sigmoid maps any input to a probability between 0 and 1; cross the threshold to decide the class.

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(max_iter=200).fit(X_tr, y_tr)
print("accuracy:", model.score(X_te, y_te))

6.3 · K-Nearest Neighbors

Classification

The idea. "Tell me who your neighbors are, I'll tell you who you are." It looks at the closest K points and takes a majority vote.

Example To classify a new flower, it finds the 3 most similar known flowers and goes with the majority type.

The new point inherits the majority label among its K closest neighbors.

from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=3).fit(X_tr, y_tr)
print("KNN:", model.score(X_te, y_te))   # scale features first — KNN uses distance

6.4 · Decision Tree

Both

The idea. Asks a series of yes/no questions until it reaches a decision. Easy to read and explain.

Example Loan approval: "Income > 50k? → Age > 30? → Approve." Exactly like a flowchart.

Each node splits the data with one question; leaves give the final answer.

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(max_depth=3, random_state=42).fit(X_tr, y_tr)
# max_depth limits how many questions → stops it from memorizing

6.5 · Random Forest

Strong default

The idea. Builds hundreds of decision trees and lets them vote. One tree can be wrong; a whole forest rarely is.

Example Instead of trusting one doctor, you poll 100 doctors and take the majority diagnosis — far more reliable.

Many decorrelated trees vote; averaging cancels out individual mistakes.

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42).fit(X_tr, y_tr)
print("forest:", model.score(X_te, y_te))
print(model.feature_importances_)   # which features mattered most

6.6 · Support Vector Machine

Classification

The idea. Draws the boundary that separates classes with the widest possible gap (margin) between them.

Example Separating cats from dogs with the cleanest line that leaves the biggest safety buffer on both sides.

The solid line is the boundary; dashed lines mark the widest margin, set by the support vectors.

from sklearn.svm import SVC
model = SVC(kernel="rbf", C=1.0).fit(X_tr, y_tr)   # rbf handles curved boundaries; scale first

6.7 · Naive Bayes

Classification

The idea. Uses probability to classify. Very fast and great for text.

Example Spam detection — emails with "free", "win", "prize" lean spam. It multiplies those word probabilities to decide.

It compares the probability of each class given the words, and picks the larger one.

from sklearn.naive_bayes import GaussianNB
model = GaussianNB().fit(X_tr, y_tr)
print("NB:", model.score(X_te, y_te))

6.8 · Gradient Boosting

Best on tables

The idea. Builds trees one after another, where each new tree fixes the mistakes of the previous ones. Usually the strongest model for tabular data.

Example Like studying smart — each round you focus on the questions you got wrong last time, improving step by step.

Each tree shaves off the leftover error, so the ensemble keeps getting more accurate.

from sklearn.ensemble import HistGradientBoostingClassifier
model = HistGradientBoostingClassifier(learning_rate=0.1, max_iter=300).fit(X_tr, y_tr)

Model comparison — when to reach for which

Model	Best for	Needs scaling?	Easy to read?	Speed
Linear / Logistic	Simple, linear relationships	Yes	Yes	Fast
KNN	Small data, clear groups	Yes	Medium	Slow at predict
Decision Tree	Interpretable rules	No	Very	Fast
Random Forest	Strong all-round default	No	Low	Medium
SVM	Clear margins, mid-size data	Yes	Low	Slow on big data
Naive Bayes	Text / many features	No	Medium	Very fast
Gradient Boosting	Top accuracy on tables	No	Low	Medium

Measuring how good your model is

Accuracy alone can lie. A cancer test that always says "healthy" is 99% accurate if only 1% are sick — yet useless. That's why we also read precision and recall from the confusion matrix.

Precision = TP / (TP+FP) · Recall = TP / (TP+FN). The off-diagonal cells are the two kinds of mistake.

Metric	Question it answers	Care most when…
Accuracy	Overall % correct	Classes are balanced
Precision	Of my "yes" predictions, how many were right?	False alarms are costly (spam filter)
Recall	Of all real positives, how many did I catch?	Misses are costly (cancer screening)
F1	Balance of precision & recall	You need one number under imbalance
R² / RMSE	How close are numeric predictions?	Regression problems

from sklearn.metrics import classification_report, confusion_matrix
pred = model.predict(X_te)
print(confusion_matrix(y_te, pred))        # the 2×2 table above
print(classification_report(y_te, pred))   # precision / recall / f1 per class

Overfitting & Underfitting

The most important idea in practice. A model can be too simple (underfit), just right, or so complex it memorizes noise (overfit).

The middle model generalizes; the right one chases every point and fails on new data.

Problem	Meaning	Sign	Fix
Underfitting	Model too simple	Bad on train and test	Stronger model, more features
Overfitting	Memorized the data	Great on train, bad on test	More data, simpler model, regularization

Example A student who memorizes practice answers (overfit) fails a slightly different exam. One who barely studied (underfit) fails everything. You want the one who understood the concepts.

from sklearn.linear_model import Ridge
model = Ridge(alpha=1.0)   # a penalty that stops the model overreacting; bigger alpha = simpler

Cross-Validation

Instead of testing once, split the data into 5 parts and test 5 times — each part takes a turn as the test set. Average the scores for a far more trustworthy estimate.

Every fold serves as the test set exactly once; the final score is the average across all five.

from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)   # 5 rounds
print("avg:", scores.mean(), "±", scores.std())

Hyperparameter tuning

Every model has settings (like n_neighbors in KNN or max_depth in a tree). GridSearchCV tries all combinations for you and keeps the best.

Example Tuning a recipe — automatically trying different oven temperatures and baking times, then keeping the combo that tastes best.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

params = {"n_estimators":[50,100,200], "max_depth":[3,5,10,None]}
grid = GridSearchCV(RandomForestClassifier(random_state=42), params, cv=5)
grid.fit(X_tr, y_tr)
print("best:", grid.best_params_)
best_model = grid.best_estimator_   # ready to use

Unsupervised models

No answers given — the model discovers structure on its own.

11.1 · K-Means

Clustering

The idea. Automatically groups similar items together, without being told the groups in advance.

Example A shop discovers customer groups — big spenders, bargain hunters, occasional buyers — purely from behavior. Nobody labeled them first.

Points snap to the nearest center; the centers shift until the groups settle.

from sklearn.cluster import KMeans
model = KMeans(n_clusters=3, n_init=10, random_state=42).fit(X)
print(model.labels_)   # which group each point belongs to

11.2 · PCA

Dimensionality reduction

The idea. Compresses many columns into a few while keeping most of the information — great for speed and for plotting data in 2D.

Example Summarizing a 300-page book into a 2-page summary that still captures the main story.

PCA finds the direction of most variation and projects data onto it, dropping redundant columns.

from sklearn.decomposition import PCA
pca = PCA(n_components=2)          # compress to 2 columns
X_2d = pca.fit_transform(X)
print("info kept:", pca.explained_variance_ratio_.sum())

Pipelines

Instead of doing scaling and the model in separate steps (and forgetting one), bundle them into a single clean object that runs in order.

One object handles every step in order — like a coffee machine: grind, brew, pour, one button.

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

pipe = Pipeline([
    ("scaler", StandardScaler()),   # step 1
    ("model", SVC()),               # step 2
])
pipe.fit(X_tr, y_tr)                   # runs both, in order
print(pipe.score(X_te, y_te))

Bonus Pipelines also prevent a classic mistake: scaling before splitting and accidentally leaking test info into training.

Full project, start to finish

Everything above, working together on a real dataset. Read it line by line.

# 1) imports
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# 2) data (tumor diagnosis: benign vs malignant)
data = load_breast_cancer()
X, y = data.data, data.target

# 3) split
X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

# 4) pipeline: scaling + model
pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", RandomForestClassifier(n_estimators=100, random_state=42)),
])

# 5) train
pipe.fit(X_tr, y_tr)

# 6) evaluate
print("test accuracy:", pipe.score(X_te, y_te))
cv = cross_val_score(pipe, X, y, cv=5)
print("cross-val:", cv.mean().round(3), "±", cv.std().round(3))
print(classification_report(y_te, pipe.predict(X_te), target_names=data.target_names))

# 7) use it on a new case
print("prediction:", data.target_names[pipe.predict(X_te[0].reshape(1, -1))[0]])

Safety The tumor example is for learning only — real medical decisions are made by doctors, not models.

Cheat sheet & next steps

Which model should I start with?

Your task	Good first move
Predict a number	`LinearRegression` → then `HistGradientBoostingRegressor`
Predict a category	`LogisticRegression` → then `RandomForestClassifier`
Best score on a table	`HistGradientBoostingClassifier`
Group data (no labels)	`KMeans`
Too many columns	`PCA`

The one mental model to keep

# Master this and every estimator clicks into place
1. Prepare data → split, scale, encode (inside a Pipeline)
2. model.fit(X_train, y_train) → learn
3. model.predict(X_test) → guess
4. Evaluate honestly → cross-validation, the right metric
5. Tune & repeat → GridSearchCV, watch for overfitting

Habits that matter

Do	Why
Run the code & change the numbers	You learn by seeing what breaks
Start simple, then go complex	A simple baseline tells you if complexity helps
Spend time on the data	~80% of real ML is cleaning & preparing data
Always keep a held-out test set	It's your only honest measure of generalization

After this guide

1 · Build a project on real data from Kaggle. 2 · Learn pandas for data cleaning. 3 · Then move to deep learning (TensorFlow / PyTorch).

What is Machine Learning?

Types of ML

How every model works: fit & predict

Train / Test split

Preparing data

Scaling — put every column on the same footing

Encoding — turn text into numbers

Supervised models

6.1 · Linear Regression

6.2 · Logistic Regression

6.3 · K-Nearest Neighbors

6.4 · Decision Tree

6.5 · Random Forest

6.6 · Support Vector Machine

6.7 · Naive Bayes

6.8 · Gradient Boosting

Model comparison — when to reach for which

Measuring how good your model is

Overfitting & Underfitting

Cross-Validation

Hyperparameter tuning

Unsupervised models

11.1 · K-Means

11.2 · PCA

Pipelines

Full project, start to finish

Cheat sheet & next steps

Which model should I start with?

The one mental model to keep

Habits that matter

After this guide

How every model works: `fit` & `predict`