Different types of Distances used in Machine Learning

4 min readDec 8, 2019

Distance may refer to a broad array of things but at the most basic level we know by definition that “Distance is a numerical measurement of how far apart objects or points are.” Basically its the measure of space between 2 or more points. How we measure the space is onto us and through this blog, I’ll try to provide a basic understanding of all the famously used distances in Machine Learning.

In the field of Data Science many of you might've heard about the infamous KNN algorithm, The K-Nearest Neighbor Algorithm is a very simple algorithm used mainly in regression and classification problems. However, the outcome of the algorithm can change much heavily depending on the distance parameters used. KNN algorithms use data and classify new data points based on similarity measures (e.g. distance function). Therefore a study of distances is essential to make the algorithm perform to its fullest.

**KNN** is a heavily used algorithm in the industry

EUCLIDEAN DISTANCE

The most common of all the distances that we are dealing with is the euclidean distance. According to the textbook definition, “The Euclidean metric is the “ordinary” straight-line distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space.” However, let's try to get a better grasp of it. It is the shortest distance between two points in space, no matter the no. of dimensions, the euclidian formula of distance holds true. We use the famous Pythagoras theorem to calculate the shortest between the two points.

Use case of Euclidean Distance: Let us take the example of flight patterns when we are going from Point A to Point B in an airplane. We choose the smallest route between point A and point B as there is no traffic or lanes and Euclidean Distance gives the best possible Result for it.

MANHATTAN DISTANCE

According to the definition of the Manhattan Distance, “The distance between two points measured along axes at right angles.” But in easier terms, let's think of it as a direct use case for GPS. It is the distance between point A and point B along the axes.

why MANHATTAN Distance? It is called the Manhattan distance because it is the distance a car would drive in a city (e.g., Manhattan) where the buildings are laid out in square blocks and the straight streets intersect at right angles.

MINKOWSKI DISTANCE

Minkowski distance is a distance/similarity measurement between two points in the normed vector space (N-dimensional real space) and is a generalization of all the distances, it is a very helpful as well as interesting generalization. The LP-norm generalization is useful in determining the Distance to use, depending on the P-value. Lets say at P=1, that is L1 we get the Manhattan Distance. Again at P=2, that is L2 , we get the Euclidean Distance. Therefore this generalization is useful to help us denote the distance with change in the P value in the formula, the distance itself changes.

HAMMING DISTANCE

Hamming distance is a metric for comparing two binary data strings. While comparing two binary strings of equal length, Hamming distance is the number of bit positions in which the two bits are different.

The Hamming distance between two strings, a and b is denoted as d(a,b).

Main use-case: It is used for error detection or error correction when data is transmitted over computer networks.

**The difference in each significant bit between the 2 binary data strings is the distance.**

Cosine similarity and Cosine distance

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0

When to use cosine similarity: Cosine similarity is generally used as a metric for measuring distance when the magnitude of the vectors does not matter. This happens for example when working with text data represented by word counts. Recommendation systems are also another good use case of cosine similarity and distances.

CONCLUSION

Its not possible to identify which is the best possible distance in general use case. Some measures are much better than other in terms of algorithm optimization that is the choice of measure is algorithm specific , we cannot say beforehand what is the best unless we choose an algorithm or a particular use case to work on. Therefore knowledge of all forms of distances in very important due to its flexibility in training the models in machine learning.