Modified: April 15, 2023
nearest neighbor
This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.Cool trick: some applications can improve on nearest-neighbor lookup by training 'Exemplar SVM's. Instead of matching against a set of embedding vectors, you represent each vector by the best linear separator versus the rest of the dataset --- that is, fit a one-versus-all SVM for each exemplar.Or another objective, e.g., logistic regression? This improves similarity search because (via karpathy):
In simple terms, because SVM considers the entire cloud of data as it optimizes for the hyperplane that "pulls apart" your positives from negatives. In comparison, the kNN approach doesn't consider the global manifold structure of your entire dataset and "values" every dimension equally. The SVM basically finds the way that your positive example is unique in the dataset, and then only considers its unique qualities when ranking all the other examples.