Distance-Based Classification — Questions & Full Solutions

📋 The Question

Given the following training dataset (three classes, each class has 5 samples, each sample has two features):

	Class W1		Class W2		Class W3
	X1	X2	X1	X2	X1	X2
Sample 1	2.491	2.167	4.218	−2.075	−2.520	0.483
Sample 2	1.053	0.667	−1.156	−2.992	−12.163	3.161
Sample 3	5.792	3.425	−4.425	1.408	−13.438	2.414
Sample 4	2.045	−1.467	−1.467	−2.838	−4.467	2.298
Sample 5	0.550	4.020	−2.137	−2.473	−3.711	4.364

Predict the class label for the following test samples using:

Euclidean Distance
City Block (Manhattan) Distance
Mahalanobis Distance

Test Sample	X1	X2
T1	2.543	0.046
T2	−2.799	0.746
T3	−7.429	2.329

🧮 Step 0 — Compute Class Mean Vectors

The classifier compares each test sample to the mean (centroid) of each class.

Formula:

\mu_k = \frac{1}{n}\sum_{i=1}^{n} x_i^{(k)}

Class W1 Mean (μ₁)

\mu_{1,X1} = \frac{2.491 + 1.053 + 5.792 + 2.045 + 0.550}{5} = \frac{11.931}{5} = 2.386

\mu_{1,X2} = \frac{2.167 + 0.667 + 3.425 + (-1.467) + 4.020}{5} = \frac{8.812}{5} = 1.762

\boxed{\mu_1 = (2.386,\ 1.762)}

Class W2 Mean (μ₂)

\mu_{2,X1} = \frac{4.218 + (-1.156) + (-4.425) + (-1.467) + (-2.137)}{5} = \frac{-4.967}{5} = -0.993

\mu_{2,X2} = \frac{(-2.075) + (-2.992) + 1.408 + (-2.838) + (-2.473)}{5} = \frac{-8.970}{5} = -1.794

\boxed{\mu_2 = (-0.993,\ -1.794)}

Class W3 Mean (μ₃)

\mu_{3,X1} = \frac{(-2.520) + (-12.163) + (-13.438) + (-4.467) + (-3.711)}{5} = \frac{-36.299}{5} = -7.260

\mu_{3,X2} = \frac{0.483 + 3.161 + 2.414 + 2.298 + 4.364}{5} = \frac{12.720}{5} = 2.544

\boxed{\mu_3 = (-7.260,\ 2.544)}

📏 Part 1 — Euclidean Distance

Formula (distance from test point x to class mean μ):

d_E(x,\ \mu) = \sqrt{(x_1 - \mu_1)^2 + (x_2 - \mu_2)^2}

The test sample is assigned to the class with the smallest Euclidean distance.

Test Sample T1 = (2.543, 0.046)

vs W1:

d_E = \sqrt{(2.543 - 2.386)^2 + (0.046 - 1.762)^2} = \sqrt{(0.157)^2 + (-1.716)^2} = \sqrt{0.025 + 2.945} = \sqrt{2.970} \approx 1.72\ \text{Smallest}

vs W2:

d_E = \sqrt{(2.543 - (-0.993))^2 + (0.046 - (-1.794))^2} = \sqrt{(3.536)^2 + (1.840)^2} = \sqrt{12.503 + 3.386} = \sqrt{15.889} \approx 3.99

vs W3:

d_E = \sqrt{(2.543 - (-7.260))^2 + (0.046 - 2.544)^2} = \sqrt{(9.803)^2 + (-2.498)^2} = \sqrt{96.099 + 6.240} = \sqrt{102.339} \approx 10.12

→ T1 is classified as Class W1 (d = 1.72, smallest)

Test Sample T2 = (−2.799, 0.746)

vs W1:

d_E = \sqrt{(-2.799 - 2.386)^2 + (0.746 - 1.762)^2} = \sqrt{(-5.185)^2 + (-1.016)^2} = \sqrt{26.884 + 1.032} = \sqrt{27.916} \approx 5.28

vs W2:

d_E = \sqrt{(-2.799 - (-0.993))^2 + (0.746 - (-1.794))^2} = \sqrt{(-1.806)^2 + (2.540)^2} = \sqrt{3.262 + 6.452} = \sqrt{9.714} \approx 3.12\ \text{Smallest}

vs W3:

d_E = \sqrt{(-2.799 - (-7.260))^2 + (0.746 - 2.544)^2} = \sqrt{(4.461)^2 + (-1.798)^2} = \sqrt{19.900 + 3.233} = \sqrt{23.133} \approx 4.81

→ T2 is classified as Class W2 (d = 3.12, smallest)

Test Sample T3 = (−7.429, 2.329)

vs W1:

d_E = \sqrt{(-7.429 - 2.386)^2 + (2.329 - 1.762)^2} = \sqrt{(-9.815)^2 + (0.567)^2} = \sqrt{96.334 + 0.321} = \sqrt{96.655} \approx 9.83

vs W2:

d_E = \sqrt{(-7.429 - (-0.993))^2 + (2.329 - (-1.794))^2} = \sqrt{(-6.436)^2 + (4.123)^2} = \sqrt{41.422 + 16.999} = \sqrt{58.421} \approx 7.64

vs W3:

d_E = \sqrt{(-7.429 - (-7.260))^2 + (2.329 - 2.544)^2} = \sqrt{(-0.169)^2 + (-0.215)^2} = \sqrt{0.029 + 0.046} = \sqrt{0.075} \approx 0.27\ \text{Smallest}

→ T3 is classified as Class W3 (d = 0.27, smallest)

🏙️ Part 2 — City Block (Manhattan) Distance

Formula:

d_{CB}(x,\ \mu) = |x_1 - \mu_1| + |x_2 - \mu_2|

Sometimes called the L1 norm or Manhattan distance — it sums the absolute differences along each axis (like city blocks on a grid).

Test Sample T1 = (2.543, 0.046)

vs W1:

d_{CB} = |2.543 - 2.386| + |0.046 - 1.762| = 0.157 + 1.716 = 1.873\ \text{Smallest}

vs W2:

d_{CB} = |2.543 - (-0.993)| + |0.046 - (-1.794)| = 3.536 + 1.840 = 5.376

vs W3:

d_{CB} = |2.543 - (-7.260)| + |0.046 - 2.544| = 9.803 + 2.498 = 12.301

→ T1 is classified as Class W1 (d = 1.873, smallest)

Test Sample T2 = (−2.799, 0.746)

vs W1:

d_{CB} = |-2.799 - 2.386| + |0.746 - 1.762| = 5.185 + 1.016 = 6.201

vs W2:

d_{CB} = |-2.799 - (-0.993)| + |0.746 - (-1.794)| = 1.806 + 2.540 = 4.346\ \text{Smallest}

vs W3:

d_{CB} = |-2.799 - (-7.260)| + |0.746 - 2.544| = 4.461 + 1.798 = 6.259

→ T2 is classified as Class W2 (d = 4.346, smallest)

Test Sample T3 = (−7.429, 2.329)

vs W1:

d_{CB} = |-7.429 - 2.386| + |2.329 - 1.762| = 9.815 + 0.567 = 10.382

vs W2:

d_{CB} = |-7.429 - (-0.993)| + |2.329 - (-1.794)| = 6.436 + 4.123 = 10.559

vs W3:

d_{CB} = |-7.429 - (-7.260)| + |2.329 - 2.544| = 0.169 + 0.215 = 0.384\ \text{Smallest}

→ T3 is classified as Class W3 (d = 0.384, smallest)

📐 Part 3 — Mahalanobis Distance

Formula:

d_{Md}(x,\ \mu) = \sqrt{(x - \mu)^T \Sigma^{-1} (x - \mu)}

Since the features are uncorrelated (ρ = 0), the covariance matrix is diagonal:

\Sigma = \begin{pmatrix} \sigma_{X1}^2 & 0 \\ 0 & \sigma_{X2}^2 \end{pmatrix} \Rightarrow \Sigma^{-1} = \begin{pmatrix} 1/\sigma_{X1}^2 & 0 \\ 0 & 1/\sigma_{X2}^2 \end{pmatrix}

This simplifies the Mahalanobis distance to:

d_{Md} = \sqrt{\frac{(x_1 - \mu_1)^2}{\sigma_{X1}^2} + \frac{(x_2 - \mu_2)^2}{\sigma_{X2}^2}}

The key difference from Euclidean: each dimension is normalized by its variance, so classes with high spread along an axis are not penalized unfairly.

Computing Covariance Matrices

Standard deviation formula:

\sigma = \sqrt{\frac{1}{n-1}\sum_{k=1}^{n}(x_k - \bar{x})^2}

From the lecture notes, the computed variances are:

\Sigma_1 = \begin{pmatrix} 4.22 & 0 \\ 0 & 4.90 \end{pmatrix}, \quad \Sigma_2 = \begin{pmatrix} 10.13 & 0 \\ 0 & 2.77 \end{pmatrix}, \quad \Sigma_3 = \begin{pmatrix} 26.26 & 0 \\ 0 & 2.00 \end{pmatrix}

Test Sample T1 = (2.543, 0.046)

vs W1 (σ²_X1 = 4.22, σ²_X2 = 4.90):

d_{Md} = \sqrt{\frac{(2.543 - 2.386)^2}{4.22} + \frac{(0.046 - 1.762)^2}{4.90}} = \sqrt{\frac{0.025}{4.22} + \frac{2.945}{4.90}} = \sqrt{0.006 + 0.601} = \sqrt{0.607} \approx 0.779\ \text{Smallest}

vs W2 (σ²_X1 = 10.13, σ²_X2 = 2.77):

d_{Md} = \sqrt{\frac{(2.543 + 0.993)^2}{10.13} + \frac{(0.046 + 1.794)^2}{2.77}} = \sqrt{\frac{12.503}{10.13} + \frac{3.386}{2.77}} = \sqrt{1.234 + 1.222} = \sqrt{2.456} \approx 1.567

vs W3 (σ²_X1 = 26.26, σ²_X2 = 2.00):

d_{Md} = \sqrt{\frac{(2.543 + 7.260)^2}{26.26} + \frac{(0.046 - 2.544)^2}{2.00}} = \sqrt{\frac{96.099}{26.26} + \frac{6.240}{2.00}} = \sqrt{3.660 + 3.120} = \sqrt{6.780} \approx 2.604

→ T1 is classified as Class W1 (d = 0.779, smallest)

Test Sample T2 = (−2.799, 0.746)

vs W1 (σ²_X1 = 4.22, σ²_X2 = 4.90):

d_{Md} = \sqrt{\frac{(-2.799 - 2.386)^2}{4.22} + \frac{(0.746 - 1.762)^2}{4.90}} = \sqrt{\frac{26.884}{4.22} + \frac{1.032}{4.90}} = \sqrt{6.370 + 0.211} = \sqrt{6.581} \approx 2.565

vs W2 (σ²_X1 = 10.13, σ²_X2 = 2.77):

d_{Md} = \sqrt{\frac{(-2.799 + 0.993)^2}{10.13} + \frac{(0.746 + 1.794)^2}{2.77}} = \sqrt{\frac{3.262}{10.13} + \frac{6.452}{2.77}} = \sqrt{0.322 + 2.329} = \sqrt{2.651} \approx 1.628

vs W3 (σ²_X1 = 26.26, σ²_X2 = 2.00):

d_{Md} = \sqrt{\frac{(-2.799 + 7.260)^2}{26.26} + \frac{(0.746 - 2.544)^2}{2.00}} = \sqrt{\frac{19.900}{26.26} + \frac{3.233}{2.00}} = \sqrt{0.758 + 1.617} = \sqrt{2.375} \approx 1.541\ \text{Smallest}

→ T2 is classified as Class W3 (d = 1.541, smallest)

⚠️ Note: Euclidean and City Block both assigned T2 → W2, but Mahalanobis assigns T2 → W3. This is because W3 has a very large variance along X1 (σ²=26.26), making the large X1 gap of 4.461 less significant after normalization.

Test Sample T3 = (−7.429, 2.329)

vs W1 (σ²_X1 = 4.22, σ²_X2 = 4.90):

d_{Md} = \sqrt{\frac{(-7.429 - 2.386)^2}{4.22} + \frac{(2.329 - 1.762)^2}{4.90}} = \sqrt{\frac{96.334}{4.22} + \frac{0.321}{4.90}} = \sqrt{22.829 + 0.066} = \sqrt{22.895} \approx 4.785

vs W2 (σ²_X1 = 10.13, σ²_X2 = 2.77):

d_{Md} = \sqrt{\frac{(-7.429 + 0.993)^2}{10.13} + \frac{(2.329 + 1.794)^2}{2.77}} = \sqrt{\frac{41.422}{10.13} + \frac{16.999}{2.77}} = \sqrt{4.089 + 6.137} = \sqrt{10.226} \approx 3.198

vs W3 (σ²_X1 = 26.26, σ²_X2 = 2.00):

d_{Md} = \sqrt{\frac{(-7.429 + 7.260)^2}{26.26} + \frac{(2.329 - 2.544)^2}{2.00}} = \sqrt{\frac{0.029}{26.26} + \frac{0.046}{2.00}} = \sqrt{0.001 + 0.023} = \sqrt{0.024} \approx 0.156\ \text{Smallest}

→ T3 is classified as Class W3 (d = 0.156, smallest)

📊 Final Results Summary

Test Sample	Euclidean →	City Block →	Mahalanobis →
T1 = (2.543, 0.046)	W1 (1.72)	W1 (1.873)	W1 (0.779)
T2 = (−2.799, 0.746)	W2 (3.12)	W2 (4.346)	W3 (1.541)
T3 = (−7.429, 2.329)	W3 (0.27)	W3 (0.384)	W3 (0.156)

Values in parentheses are the winning (minimum) distances.

💡 Key Concepts & Explanations

Why Minimum Distance to Means?

Each class is summarized by its mean vector. An unknown point is classified to whichever class centroid is nearest. This is simple, fast, and effective when classes are roughly spherical and well-separated.

Euclidean vs City Block

Euclidean (L2) treats distance as the "straight-line" diagonal — it's rotationally invariant.
City Block (L1) adds absolute differences along each axis, like walking along a grid. It's not rotationally invariant but is more robust to outliers.

Why Mahalanobis is Different

Euclidean distance treats all dimensions equally. If one feature has a much higher variance (spread), it dominates the distance. Mahalanobis normalizes each dimension by its variance, effectively placing all features on equal footing. This is why T2 changes classification from W2 to W3 — W3 has enormous variance along X1 (σ²=26.26), so the large gap in X1 matters less.

When Does Mahalanobis Match Euclidean?

When all variances are equal (σ²_X1 = σ²_X2 = constant), Mahalanobis reduces to Euclidean distance.

Short Explanation Video

Generated using NotebookLM:

📋 The Question​

🧮 Step 0 — Compute Class Mean Vectors​

Class W1 Mean (μ₁)​

Class W2 Mean (μ₂)​

Class W3 Mean (μ₃)​

📏 Part 1 — Euclidean Distance​

Test Sample T1 = (2.543, 0.046)​

Test Sample T2 = (−2.799, 0.746)​

Test Sample T3 = (−7.429, 2.329)​

🏙️ Part 2 — City Block (Manhattan) Distance​

Test Sample T1 = (2.543, 0.046)​

Test Sample T2 = (−2.799, 0.746)​

Test Sample T3 = (−7.429, 2.329)​

📐 Part 3 — Mahalanobis Distance​

Computing Covariance Matrices​

Test Sample T1 = (2.543, 0.046)​

Test Sample T2 = (−2.799, 0.746)​

Test Sample T3 = (−7.429, 2.329)​

📊 Final Results Summary​

💡 Key Concepts & Explanations​

Why Minimum Distance to Means?​

Euclidean vs City Block​

Why Mahalanobis is Different​

When Does Mahalanobis Match Euclidean?​

Short Explanation Video​

📋 The Question

🧮 Step 0 — Compute Class Mean Vectors

Class W1 Mean (μ₁)

Class W2 Mean (μ₂)

Class W3 Mean (μ₃)

📏 Part 1 — Euclidean Distance

Test Sample T1 = (2.543, 0.046)

Test Sample T2 = (−2.799, 0.746)

Test Sample T3 = (−7.429, 2.329)

🏙️ Part 2 — City Block (Manhattan) Distance

Test Sample T1 = (2.543, 0.046)

Test Sample T2 = (−2.799, 0.746)

Test Sample T3 = (−7.429, 2.329)

📐 Part 3 — Mahalanobis Distance

Computing Covariance Matrices

Test Sample T1 = (2.543, 0.046)

Test Sample T2 = (−2.799, 0.746)

Test Sample T3 = (−7.429, 2.329)

📊 Final Results Summary

💡 Key Concepts & Explanations

Why Minimum Distance to Means?

Euclidean vs City Block

Why Mahalanobis is Different

When Does Mahalanobis Match Euclidean?

Short Explanation Video