Machine learning/Classification algorithms

Classification is a subcategory of supervised learning problems.

k-nearest neighbor

 * a simple classification algorithm
 * Intuition: Find the majority vote in the training data
 * This is a discriminative model, meaning that there is no way to generate the training data points

Algorithm

 * Define some distance metric or similarity metric. The simplest case is Euclidean distance.
 * Given some input point $$x$$, find the $$k$$'th nearest neighbors from the training set.
 * Do a majority vote between these nearest neighbor list and classify the input point as the category with highest number of vote.

Probabilistic interpretation
Consider the classification output as a random variable $$y$$. Define probability of $$y$$ given input $$x $$ and training data $$D $$ is

$$P(y|x, D) = \text{fraction of points } x_i \text{ in } k\text{-th nearest neighbor points to }x \text{ such that }y_i=y $$The output of the classification is

$$\hat{y} = \arg \max_y P(y|x, D) $$Read more about probabilistic interpretation here:


 * https://www.cc.gatech.edu/~afb/classes/CS7616-Spring2014/slides/CS7616-13a-PKNN.pdf