To classify something, find things that are similar and label it with the same class as the most similar thing.
The feature space is \(N-d\), where \(N\) is the number of features. Each instance is mapped to a point. The descriptive features become the axes.
The Similarity Metric
Mathematically, it must conform to these 4 criteria:
- Non-negative: \(f(a, b) \ge 0\)
- Identity: \( f(a, b) = 0 \iff a = b \)
- Symmetry: \( f(a, b) = f(b, a) \)
- Triangular inequality: \( f(a, b) \le f(a, c) + f(c, b) \)
The Minkowski Distance
Where \(a\) and \(b\) are the instances being compared. \(m\) is the number of descriptive features:
$$ f(a, b) = \left( \sum_{i=1}^{m} abs(a[i] - b[i])^{p} \right)^{1/p} $$
Because the differences are raised to the \(p\)‘th power, higher \(p\) emphasizes features with big differences.
As \( p \to \infty \), we get the biggest absolute difference since it dominates the other sums. \(p = \infty \) is referred to as the Chebyshev (or Chessboard) Distance.
The Manhattan (or Taxi-Cab) Distance (\( p = 1\))
$$ f(a, b) = \sum_{i=1}^{m} |a[i] - b[i]| $$
The Euclidean Distance (\( p = 2\))
Interesting to note that “Euclidean Distance” is somewhat a misnomer. Given that Minkowski distances satisfy metrics criteria, they seem to assume a Euclidean (flat) geometry. In that sense, the Manhattan Distance is not, strictly speaking, a non-euclidean distance.
Euclidean space is the de-facto geometry in ML for convenience/tradition rather than by design, which limits us. For example, hyperbolic space offers a better model for embedding a tree in a way that maintains the distance of nodes, i.e. the volume grows exponentially in the tree depth.
$$ f(a, b) = \sqrt{ \sum_{i=1}^{m} (a[i] - b[i])^2 } $$
Euclidean is the default, unless one is constrained by computational resources.
Note that the Euclidean distance is more influenced by a single large difference in one feature than a lot of small differences across a set of features. The opposite is true of the Manhattan Distance.
Why are non-negativity and triangular inequality important? It seems that we should think of the similarity metric as a measure of “distance”. Distance between two instances obeys the non-negativity & triangular inequality conditions.