Advanced Machine Learning — Classification Algorithms

Published: November 12, 2025 • Language: python • Chapter: 15 • Sub: 1 • Level: beginner

python

Chapter 15: Advanced Machine Learning — Classification Algorithms

🧠 Introduction to Classification

Classification is a core machine learning task that involves predicting categorical outcomes — determining which class or category a given sample belongs to based on its features.

Unlike regression (which predicts continuous values), classification predicts discrete labels, such as:

  • ✅ Spam or Not Spam
  • ❤️ Disease or No Disease
  • 🐶 Dog, 🐱 Cat, 🐦 Bird

It’s one of the most widely used ML paradigms across industries — from fraud detection to sentiment analysis and image recognition.


⚙️ 1. How Classification Works

The model learns a decision boundary that separates data points from different classes.

Problem Type Example Target Output
Binary Classification Spam filtering 0 or 1
Multi‑Class Classification Iris flower species Setosa / Versicolor / Virginica
Multi‑Label Classification Movie genres [Action, Comedy]

🧩 2. Common Classification Algorithms

Algorithm Type When to Use Key Strength
Logistic Regression Linear Simple, interpretable models Probability outputs
Support Vector Machine (SVM) Non‑linear Small, clean datasets Robust with clear margins
Decision Tree Non‑linear Explainable models Human‑interpretable decisions
Random Forest Ensemble Complex problems Handles non‑linearity well
K‑Nearest Neighbors (KNN) Non‑parametric Small datasets No training phase
Naïve Bayes Probabilistic Text classification Fast, low resource usage
Neural Networks Deep Learning Large datasets High accuracy potential

🌸 3. Example — Iris Classification with SVM

Let’s build a Support Vector Machine (SVM) classifier using Scikit‑Learn’s Iris dataset.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train model
model = SVC(kernel='rbf', gamma='auto', C=1)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Visualize Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix — Iris Classification")
plt.show()

🧠 Key Observations

  • Accuracy measures overall correctness.
  • Precision / Recall / F1 show per‑class quality.
  • The confusion matrix visually identifies where misclassifications occur.

🔍 4. Decision Boundary Visualization (Optional)

For datasets with only two features, you can visualize the learned boundaries.

from sklearn.datasets import make_classification
import numpy as np

# Generate sample 2D dataset
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, random_state=3, n_clusters_per_class=1)

model = SVC(kernel='linear')
model.fit(X, y)

# Plot decision regions
plt.figure(figsize=(6,5))
x_min, x_max = X[:,0].min() - 1, X[:,0].max() + 1
y_min, y_max = X[:,1].min() - 1, X[:,1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap='coolwarm')
sns.scatterplot(x=X[:,0], y=X[:,1], hue=y, edgecolor='k', palette='deep')
plt.title("SVM Decision Boundary")
plt.show()

The shaded regions represent how the SVM separates different classes using its decision boundary.


🧮 5. Evaluating Classification Models

Metric Description Function
Accuracy % of correct predictions. accuracy_score()
Precision % of predicted positives that are correct. precision_score()
Recall (Sensitivity) % of actual positives correctly identified. recall_score()
F1‑Score Harmonic mean of precision & recall. f1_score()
ROC‑AUC Quality of binary classification curve. roc_auc_score()

A classification report (via classification_report()) summarizes these metrics automatically.


🎯 6. Hyperparameter Tuning with GridSearchCV

Optimizing hyperparameters can significantly improve model performance.

from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.1, 1, 10],
    'kernel': ['linear', 'rbf', 'poly'],
    'gamma': ['scale', 'auto']
}

grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Cross‑Val Score:", grid.best_score_)

GridSearchCV automates trying combinations of parameters and cross‑validates results for robust tuning.


⚖️ 7. Understanding Bias–Variance Trade‑off

Concept Description Risk
High Bias (Underfitting) Model too simple → misses patterns. Low accuracy
High Variance (Overfitting) Model too complex → memorizes training data. Poor generalization

Regularization (C in SVMs, alpha in logistic regression) helps balance bias and variance.


🚀 8. Takeaways

  • Classification predicts discrete categories, unlike regression.
  • Always inspect both quantitative metrics and visual results.
  • Tune hyperparameters (kernel, C, gamma) for better performance.
  • Use ensemble methods (e.g., Random Forest, XGBoost) for complex datasets.
  • Balance classes in your data to avoid bias using techniques like SMOTE.

🧭 Conclusion

Classification algorithms form the foundation of intelligent decision systems — from email filters to diagnostic AI.
By mastering Scikit‑Learn’s tools for classification, evaluation, and tuning, you’ll be well‑equipped to tackle real‑world predictive challenges.

“Accuracy is important, but understanding why the model made a decision matters even more.”