Skip to main content

Distance-Based Classification โ€” Questions & Full Solutions

๐Ÿ“‹ The Questionโ€‹

Given the following training dataset (three classes, each class has 5 samples, each sample has two features):

Class W1Class W2Class W3
X1X2X1X2X1X2
Sample 12.4912.1674.218โˆ’2.075โˆ’2.5200.483
Sample 21.0530.667โˆ’1.156โˆ’2.992โˆ’12.1633.161
Sample 35.7923.425โˆ’4.4251.408โˆ’13.4382.414
Sample 42.045โˆ’1.467โˆ’1.467โˆ’2.838โˆ’4.4672.298
Sample 50.5504.020โˆ’2.137โˆ’2.473โˆ’3.7114.364

Predict the class label for the following test samples using:

  1. Euclidean Distance
  2. City Block (Manhattan) Distance
  3. Mahalanobis Distance
Test SampleX1X2
T12.5430.046
T2โˆ’2.7990.746
T3โˆ’7.4292.329

๐Ÿงฎ Step 0 โ€” Compute Class Mean Vectorsโ€‹

The classifier compares each test sample to the mean (centroid) of each class.

Formula:

ฮผk=1nโˆ‘i=1nxi(k)\mu_k = \frac{1}{n}\sum_{i=1}^{n} x_i^{(k)}

Class W1 Mean (ฮผโ‚)โ€‹

ฮผ1,X1=2.491+1.053+5.792+2.045+0.5505=11.9315=2.386\mu_{1,X1} = \frac{2.491 + 1.053 + 5.792 + 2.045 + 0.550}{5} = \frac{11.931}{5} = 2.386 ฮผ1,X2=2.167+0.667+3.425+(โˆ’1.467)+4.0205=8.8125=1.762\mu_{1,X2} = \frac{2.167 + 0.667 + 3.425 + (-1.467) + 4.020}{5} = \frac{8.812}{5} = 1.762 ฮผ1=(2.386,ย 1.762)\boxed{\mu_1 = (2.386,\ 1.762)}

Class W2 Mean (ฮผโ‚‚)โ€‹

ฮผ2,X1=4.218+(โˆ’1.156)+(โˆ’4.425)+(โˆ’1.467)+(โˆ’2.137)5=โˆ’4.9675=โˆ’0.993\mu_{2,X1} = \frac{4.218 + (-1.156) + (-4.425) + (-1.467) + (-2.137)}{5} = \frac{-4.967}{5} = -0.993 ฮผ2,X2=(โˆ’2.075)+(โˆ’2.992)+1.408+(โˆ’2.838)+(โˆ’2.473)5=โˆ’8.9705=โˆ’1.794\mu_{2,X2} = \frac{(-2.075) + (-2.992) + 1.408 + (-2.838) + (-2.473)}{5} = \frac{-8.970}{5} = -1.794 ฮผ2=(โˆ’0.993,ย โˆ’1.794)\boxed{\mu_2 = (-0.993,\ -1.794)}

Class W3 Mean (ฮผโ‚ƒ)โ€‹

ฮผ3,X1=(โˆ’2.520)+(โˆ’12.163)+(โˆ’13.438)+(โˆ’4.467)+(โˆ’3.711)5=โˆ’36.2995=โˆ’7.260\mu_{3,X1} = \frac{(-2.520) + (-12.163) + (-13.438) + (-4.467) + (-3.711)}{5} = \frac{-36.299}{5} = -7.260 ฮผ3,X2=0.483+3.161+2.414+2.298+4.3645=12.7205=2.544\mu_{3,X2} = \frac{0.483 + 3.161 + 2.414 + 2.298 + 4.364}{5} = \frac{12.720}{5} = 2.544 ฮผ3=(โˆ’7.260,ย 2.544)\boxed{\mu_3 = (-7.260,\ 2.544)}

๐Ÿ“ Part 1 โ€” Euclidean Distanceโ€‹

Formula (distance from test point x to class mean ฮผ):

dE(x,ย ฮผ)=(x1โˆ’ฮผ1)2+(x2โˆ’ฮผ2)2d_E(x,\ \mu) = \sqrt{(x_1 - \mu_1)^2 + (x_2 - \mu_2)^2}

The test sample is assigned to the class with the smallest Euclidean distance.


Test Sample T1 = (2.543, 0.046)โ€‹

vs W1:

dE=(2.543โˆ’2.386)2+(0.046โˆ’1.762)2=(0.157)2+(โˆ’1.716)2=0.025+2.945=2.970โ‰ˆ1.72ย Smallestd_E = \sqrt{(2.543 - 2.386)^2 + (0.046 - 1.762)^2} = \sqrt{(0.157)^2 + (-1.716)^2} = \sqrt{0.025 + 2.945} = \sqrt{2.970} \approx 1.72\ \text{Smallest}

vs W2:

dE=(2.543โˆ’(โˆ’0.993))2+(0.046โˆ’(โˆ’1.794))2=(3.536)2+(1.840)2=12.503+3.386=15.889โ‰ˆ3.99d_E = \sqrt{(2.543 - (-0.993))^2 + (0.046 - (-1.794))^2} = \sqrt{(3.536)^2 + (1.840)^2} = \sqrt{12.503 + 3.386} = \sqrt{15.889} \approx 3.99

vs W3:

dE=(2.543โˆ’(โˆ’7.260))2+(0.046โˆ’2.544)2=(9.803)2+(โˆ’2.498)2=96.099+6.240=102.339โ‰ˆ10.12d_E = \sqrt{(2.543 - (-7.260))^2 + (0.046 - 2.544)^2} = \sqrt{(9.803)^2 + (-2.498)^2} = \sqrt{96.099 + 6.240} = \sqrt{102.339} \approx 10.12

โ†’ T1 is classified as Class W1 (d = 1.72, smallest)


Test Sample T2 = (โˆ’2.799, 0.746)โ€‹

vs W1:

dE=(โˆ’2.799โˆ’2.386)2+(0.746โˆ’1.762)2=(โˆ’5.185)2+(โˆ’1.016)2=26.884+1.032=27.916โ‰ˆ5.28d_E = \sqrt{(-2.799 - 2.386)^2 + (0.746 - 1.762)^2} = \sqrt{(-5.185)^2 + (-1.016)^2} = \sqrt{26.884 + 1.032} = \sqrt{27.916} \approx 5.28

vs W2:

dE=(โˆ’2.799โˆ’(โˆ’0.993))2+(0.746โˆ’(โˆ’1.794))2=(โˆ’1.806)2+(2.540)2=3.262+6.452=9.714โ‰ˆ3.12ย Smallestd_E = \sqrt{(-2.799 - (-0.993))^2 + (0.746 - (-1.794))^2} = \sqrt{(-1.806)^2 + (2.540)^2} = \sqrt{3.262 + 6.452} = \sqrt{9.714} \approx 3.12\ \text{Smallest}

vs W3:

dE=(โˆ’2.799โˆ’(โˆ’7.260))2+(0.746โˆ’2.544)2=(4.461)2+(โˆ’1.798)2=19.900+3.233=23.133โ‰ˆ4.81d_E = \sqrt{(-2.799 - (-7.260))^2 + (0.746 - 2.544)^2} = \sqrt{(4.461)^2 + (-1.798)^2} = \sqrt{19.900 + 3.233} = \sqrt{23.133} \approx 4.81

โ†’ T2 is classified as Class W2 (d = 3.12, smallest)


Test Sample T3 = (โˆ’7.429, 2.329)โ€‹

vs W1:

dE=(โˆ’7.429โˆ’2.386)2+(2.329โˆ’1.762)2=(โˆ’9.815)2+(0.567)2=96.334+0.321=96.655โ‰ˆ9.83d_E = \sqrt{(-7.429 - 2.386)^2 + (2.329 - 1.762)^2} = \sqrt{(-9.815)^2 + (0.567)^2} = \sqrt{96.334 + 0.321} = \sqrt{96.655} \approx 9.83

vs W2:

dE=(โˆ’7.429โˆ’(โˆ’0.993))2+(2.329โˆ’(โˆ’1.794))2=(โˆ’6.436)2+(4.123)2=41.422+16.999=58.421โ‰ˆ7.64d_E = \sqrt{(-7.429 - (-0.993))^2 + (2.329 - (-1.794))^2} = \sqrt{(-6.436)^2 + (4.123)^2} = \sqrt{41.422 + 16.999} = \sqrt{58.421} \approx 7.64

vs W3:

dE=(โˆ’7.429โˆ’(โˆ’7.260))2+(2.329โˆ’2.544)2=(โˆ’0.169)2+(โˆ’0.215)2=0.029+0.046=0.075โ‰ˆ0.27ย Smallestd_E = \sqrt{(-7.429 - (-7.260))^2 + (2.329 - 2.544)^2} = \sqrt{(-0.169)^2 + (-0.215)^2} = \sqrt{0.029 + 0.046} = \sqrt{0.075} \approx 0.27\ \text{Smallest}

โ†’ T3 is classified as Class W3 (d = 0.27, smallest)


๐Ÿ™๏ธ Part 2 โ€” City Block (Manhattan) Distanceโ€‹

Formula:

dCB(x,ย ฮผ)=โˆฃx1โˆ’ฮผ1โˆฃ+โˆฃx2โˆ’ฮผ2โˆฃd_{CB}(x,\ \mu) = |x_1 - \mu_1| + |x_2 - \mu_2|

Sometimes called the L1 norm or Manhattan distance โ€” it sums the absolute differences along each axis (like city blocks on a grid).


Test Sample T1 = (2.543, 0.046)โ€‹

vs W1:

dCB=โˆฃ2.543โˆ’2.386โˆฃ+โˆฃ0.046โˆ’1.762โˆฃ=0.157+1.716=1.873ย Smallestd_{CB} = |2.543 - 2.386| + |0.046 - 1.762| = 0.157 + 1.716 = 1.873\ \text{Smallest}

vs W2:

dCB=โˆฃ2.543โˆ’(โˆ’0.993)โˆฃ+โˆฃ0.046โˆ’(โˆ’1.794)โˆฃ=3.536+1.840=5.376d_{CB} = |2.543 - (-0.993)| + |0.046 - (-1.794)| = 3.536 + 1.840 = 5.376

vs W3:

dCB=โˆฃ2.543โˆ’(โˆ’7.260)โˆฃ+โˆฃ0.046โˆ’2.544โˆฃ=9.803+2.498=12.301d_{CB} = |2.543 - (-7.260)| + |0.046 - 2.544| = 9.803 + 2.498 = 12.301

โ†’ T1 is classified as Class W1 (d = 1.873, smallest)


Test Sample T2 = (โˆ’2.799, 0.746)โ€‹

vs W1:

dCB=โˆฃโˆ’2.799โˆ’2.386โˆฃ+โˆฃ0.746โˆ’1.762โˆฃ=5.185+1.016=6.201d_{CB} = |-2.799 - 2.386| + |0.746 - 1.762| = 5.185 + 1.016 = 6.201

vs W2:

dCB=โˆฃโˆ’2.799โˆ’(โˆ’0.993)โˆฃ+โˆฃ0.746โˆ’(โˆ’1.794)โˆฃ=1.806+2.540=4.346ย Smallestd_{CB} = |-2.799 - (-0.993)| + |0.746 - (-1.794)| = 1.806 + 2.540 = 4.346\ \text{Smallest}

vs W3:

dCB=โˆฃโˆ’2.799โˆ’(โˆ’7.260)โˆฃ+โˆฃ0.746โˆ’2.544โˆฃ=4.461+1.798=6.259d_{CB} = |-2.799 - (-7.260)| + |0.746 - 2.544| = 4.461 + 1.798 = 6.259

โ†’ T2 is classified as Class W2 (d = 4.346, smallest)


Test Sample T3 = (โˆ’7.429, 2.329)โ€‹

vs W1:

dCB=โˆฃโˆ’7.429โˆ’2.386โˆฃ+โˆฃ2.329โˆ’1.762โˆฃ=9.815+0.567=10.382d_{CB} = |-7.429 - 2.386| + |2.329 - 1.762| = 9.815 + 0.567 = 10.382

vs W2:

dCB=โˆฃโˆ’7.429โˆ’(โˆ’0.993)โˆฃ+โˆฃ2.329โˆ’(โˆ’1.794)โˆฃ=6.436+4.123=10.559d_{CB} = |-7.429 - (-0.993)| + |2.329 - (-1.794)| = 6.436 + 4.123 = 10.559

vs W3:

dCB=โˆฃโˆ’7.429โˆ’(โˆ’7.260)โˆฃ+โˆฃ2.329โˆ’2.544โˆฃ=0.169+0.215=0.384ย Smallestd_{CB} = |-7.429 - (-7.260)| + |2.329 - 2.544| = 0.169 + 0.215 = 0.384\ \text{Smallest}

โ†’ T3 is classified as Class W3 (d = 0.384, smallest)


๐Ÿ“ Part 3 โ€” Mahalanobis Distanceโ€‹

Formula:

dMd(x,ย ฮผ)=(xโˆ’ฮผ)Tฮฃโˆ’1(xโˆ’ฮผ)d_{Md}(x,\ \mu) = \sqrt{(x - \mu)^T \Sigma^{-1} (x - \mu)}

Since the features are uncorrelated (ฯ = 0), the covariance matrix is diagonal:

ฮฃ=(ฯƒX1200ฯƒX22)โ‡’ฮฃโˆ’1=(1/ฯƒX12001/ฯƒX22)\Sigma = \begin{pmatrix} \sigma_{X1}^2 & 0 \\ 0 & \sigma_{X2}^2 \end{pmatrix} \Rightarrow \Sigma^{-1} = \begin{pmatrix} 1/\sigma_{X1}^2 & 0 \\ 0 & 1/\sigma_{X2}^2 \end{pmatrix}

This simplifies the Mahalanobis distance to:

dMd=(x1โˆ’ฮผ1)2ฯƒX12+(x2โˆ’ฮผ2)2ฯƒX22d_{Md} = \sqrt{\frac{(x_1 - \mu_1)^2}{\sigma_{X1}^2} + \frac{(x_2 - \mu_2)^2}{\sigma_{X2}^2}}

The key difference from Euclidean: each dimension is normalized by its variance, so classes with high spread along an axis are not penalized unfairly.


Computing Covariance Matricesโ€‹

Standard deviation formula:

ฯƒ=1nโˆ’1โˆ‘k=1n(xkโˆ’xห‰)2\sigma = \sqrt{\frac{1}{n-1}\sum_{k=1}^{n}(x_k - \bar{x})^2}

From the lecture notes, the computed variances are:

ฮฃ1=(4.22004.90),ฮฃ2=(10.13002.77),ฮฃ3=(26.26002.00)\Sigma_1 = \begin{pmatrix} 4.22 & 0 \\ 0 & 4.90 \end{pmatrix}, \quad \Sigma_2 = \begin{pmatrix} 10.13 & 0 \\ 0 & 2.77 \end{pmatrix}, \quad \Sigma_3 = \begin{pmatrix} 26.26 & 0 \\ 0 & 2.00 \end{pmatrix}

Test Sample T1 = (2.543, 0.046)โ€‹

vs W1 (ฯƒยฒ_X1 = 4.22, ฯƒยฒ_X2 = 4.90):

dMd=(2.543โˆ’2.386)24.22+(0.046โˆ’1.762)24.90=0.0254.22+2.9454.90=0.006+0.601=0.607โ‰ˆ0.779ย Smallestd_{Md} = \sqrt{\frac{(2.543 - 2.386)^2}{4.22} + \frac{(0.046 - 1.762)^2}{4.90}} = \sqrt{\frac{0.025}{4.22} + \frac{2.945}{4.90}} = \sqrt{0.006 + 0.601} = \sqrt{0.607} \approx 0.779\ \text{Smallest}

vs W2 (ฯƒยฒ_X1 = 10.13, ฯƒยฒ_X2 = 2.77):

dMd=(2.543+0.993)210.13+(0.046+1.794)22.77=12.50310.13+3.3862.77=1.234+1.222=2.456โ‰ˆ1.567d_{Md} = \sqrt{\frac{(2.543 + 0.993)^2}{10.13} + \frac{(0.046 + 1.794)^2}{2.77}} = \sqrt{\frac{12.503}{10.13} + \frac{3.386}{2.77}} = \sqrt{1.234 + 1.222} = \sqrt{2.456} \approx 1.567

vs W3 (ฯƒยฒ_X1 = 26.26, ฯƒยฒ_X2 = 2.00):

dMd=(2.543+7.260)226.26+(0.046โˆ’2.544)22.00=96.09926.26+6.2402.00=3.660+3.120=6.780โ‰ˆ2.604d_{Md} = \sqrt{\frac{(2.543 + 7.260)^2}{26.26} + \frac{(0.046 - 2.544)^2}{2.00}} = \sqrt{\frac{96.099}{26.26} + \frac{6.240}{2.00}} = \sqrt{3.660 + 3.120} = \sqrt{6.780} \approx 2.604

โ†’ T1 is classified as Class W1 (d = 0.779, smallest)


Test Sample T2 = (โˆ’2.799, 0.746)โ€‹

vs W1 (ฯƒยฒ_X1 = 4.22, ฯƒยฒ_X2 = 4.90):

dMd=(โˆ’2.799โˆ’2.386)24.22+(0.746โˆ’1.762)24.90=26.8844.22+1.0324.90=6.370+0.211=6.581โ‰ˆ2.565d_{Md} = \sqrt{\frac{(-2.799 - 2.386)^2}{4.22} + \frac{(0.746 - 1.762)^2}{4.90}} = \sqrt{\frac{26.884}{4.22} + \frac{1.032}{4.90}} = \sqrt{6.370 + 0.211} = \sqrt{6.581} \approx 2.565

vs W2 (ฯƒยฒ_X1 = 10.13, ฯƒยฒ_X2 = 2.77):

dMd=(โˆ’2.799+0.993)210.13+(0.746+1.794)22.77=3.26210.13+6.4522.77=0.322+2.329=2.651โ‰ˆ1.628d_{Md} = \sqrt{\frac{(-2.799 + 0.993)^2}{10.13} + \frac{(0.746 + 1.794)^2}{2.77}} = \sqrt{\frac{3.262}{10.13} + \frac{6.452}{2.77}} = \sqrt{0.322 + 2.329} = \sqrt{2.651} \approx 1.628

vs W3 (ฯƒยฒ_X1 = 26.26, ฯƒยฒ_X2 = 2.00):

dMd=(โˆ’2.799+7.260)226.26+(0.746โˆ’2.544)22.00=19.90026.26+3.2332.00=0.758+1.617=2.375โ‰ˆ1.541ย Smallestd_{Md} = \sqrt{\frac{(-2.799 + 7.260)^2}{26.26} + \frac{(0.746 - 2.544)^2}{2.00}} = \sqrt{\frac{19.900}{26.26} + \frac{3.233}{2.00}} = \sqrt{0.758 + 1.617} = \sqrt{2.375} \approx 1.541\ \text{Smallest}

โ†’ T2 is classified as Class W3 (d = 1.541, smallest)

โš ๏ธ Note: Euclidean and City Block both assigned T2 โ†’ W2, but Mahalanobis assigns T2 โ†’ W3. This is because W3 has a very large variance along X1 (ฯƒยฒ=26.26), making the large X1 gap of 4.461 less significant after normalization.


Test Sample T3 = (โˆ’7.429, 2.329)โ€‹

vs W1 (ฯƒยฒ_X1 = 4.22, ฯƒยฒ_X2 = 4.90):

dMd=(โˆ’7.429โˆ’2.386)24.22+(2.329โˆ’1.762)24.90=96.3344.22+0.3214.90=22.829+0.066=22.895โ‰ˆ4.785d_{Md} = \sqrt{\frac{(-7.429 - 2.386)^2}{4.22} + \frac{(2.329 - 1.762)^2}{4.90}} = \sqrt{\frac{96.334}{4.22} + \frac{0.321}{4.90}} = \sqrt{22.829 + 0.066} = \sqrt{22.895} \approx 4.785

vs W2 (ฯƒยฒ_X1 = 10.13, ฯƒยฒ_X2 = 2.77):

dMd=(โˆ’7.429+0.993)210.13+(2.329+1.794)22.77=41.42210.13+16.9992.77=4.089+6.137=10.226โ‰ˆ3.198d_{Md} = \sqrt{\frac{(-7.429 + 0.993)^2}{10.13} + \frac{(2.329 + 1.794)^2}{2.77}} = \sqrt{\frac{41.422}{10.13} + \frac{16.999}{2.77}} = \sqrt{4.089 + 6.137} = \sqrt{10.226} \approx 3.198

vs W3 (ฯƒยฒ_X1 = 26.26, ฯƒยฒ_X2 = 2.00):

dMd=(โˆ’7.429+7.260)226.26+(2.329โˆ’2.544)22.00=0.02926.26+0.0462.00=0.001+0.023=0.024โ‰ˆ0.156ย Smallestd_{Md} = \sqrt{\frac{(-7.429 + 7.260)^2}{26.26} + \frac{(2.329 - 2.544)^2}{2.00}} = \sqrt{\frac{0.029}{26.26} + \frac{0.046}{2.00}} = \sqrt{0.001 + 0.023} = \sqrt{0.024} \approx 0.156\ \text{Smallest}

โ†’ T3 is classified as Class W3 (d = 0.156, smallest)


๐Ÿ“Š Final Results Summaryโ€‹

Test SampleEuclidean โ†’City Block โ†’Mahalanobis โ†’
T1 = (2.543, 0.046)W1 (1.72)W1 (1.873)W1 (0.779)
T2 = (โˆ’2.799, 0.746)W2 (3.12)W2 (4.346)W3 (1.541)
T3 = (โˆ’7.429, 2.329)W3 (0.27)W3 (0.384)W3 (0.156)

Values in parentheses are the winning (minimum) distances.


๐Ÿ’ก Key Concepts & Explanationsโ€‹

Why Minimum Distance to Means?โ€‹

Each class is summarized by its mean vector. An unknown point is classified to whichever class centroid is nearest. This is simple, fast, and effective when classes are roughly spherical and well-separated.

Euclidean vs City Blockโ€‹

  • Euclidean (L2) treats distance as the "straight-line" diagonal โ€” it's rotationally invariant.
  • City Block (L1) adds absolute differences along each axis, like walking along a grid. It's not rotationally invariant but is more robust to outliers.

Why Mahalanobis is Differentโ€‹

Euclidean distance treats all dimensions equally. If one feature has a much higher variance (spread), it dominates the distance. Mahalanobis normalizes each dimension by its variance, effectively placing all features on equal footing. This is why T2 changes classification from W2 to W3 โ€” W3 has enormous variance along X1 (ฯƒยฒ=26.26), so the large gap in X1 matters less.

When Does Mahalanobis Match Euclidean?โ€‹

When all variances are equal (ฯƒยฒ_X1 = ฯƒยฒ_X2 = constant), Mahalanobis reduces to Euclidean distance.


Short Explanation Videoโ€‹

Generated using NotebookLM: