Contents
Introduction
Cosine distance, sometimes incorrectly called cosine similarity, is used to measure how similar two non-zero vectors are. Cosine similarity is defined as:
\[\forall \ \vec{a},\vec{b} \in \{\mathbb{R}^{n} \setminus \vec{0}\}: S_{c}(\vec{a},\vec{b}) = \frac{\vec{a} \cdot \vec{b}} {|\vec{a}| \ |\vec{b}|}\] \[= \frac{\sum_{i=1}^{n} a_{i}b_{i}} {\sqrt{\sum_{i=1}^{n}{ a_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{ b_{i}^{2}}}}\]It is defined for the range of \([-1,1]\). A cosine similarity of 1 means that two vectors are exactly the same whereas -1 means that they are exactly opposite. 0 would be the result if the two vectors evaluated are orthogonal.
The cosine distance is defined as \(1 - S_{c}(\vec{a},\vec{b})\):
\[\text{dist}(\vec{a},\vec{b}) = 1 - \frac{\vec{a} \cdot \vec{b}} {|\vec{a}| \ |\vec{b}|}\]In this case a distance of 0 means that they are identical.
Implementations
Python
SciPy offers cosine distance of 1-D arrays as part of its spatial distance functionality. While SciPy provides convenient access to certain algorithms they often turn out to be a bit slow or at least much slower than they could be.
A straight forward Python implementation would look like this:
def cosine_dist_python(A,B):
A_dot_B = np.dot(A,B)
A_mag = np.sqrt(np.sum(np.square(A)))
B_mag = np.sqrt(np.sum(np.square(B)))
dist = 1.0 - (A_dot_B / (A_mag * B_mag))
return dist
Wrapping it in cython without spending too much time on optimization yields something like this:
import cython
cimport cython
cimport numpy as np
import numpy as np
DTYPE = np.float64
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef double cosine_dist_cython(np.ndarray A,
np.ndarray B):
assert A.shape[0] == B.shape[0]
assert A.dtype == DTYPE
assert B.dtype == DTYPE
cdef double dist
cdef double A_dot_B
cdef double A_mag
cdef double B_mag
A_dot_B = np.dot(A,B)
A_mag = np.sqrt(np.sum(np.square(A)))
B_mag = np.sqrt(np.sum(np.square(B)))
dist = 1.0 - (A_dot_B / (A_mag * B_mag))
return dist
Using memory views is a common approach to squeeze performance out of python code when input type and dimensions are well known and the vector in question is stored in contiguous memory.
import cython
cimport cython
cimport numpy as np
import numpy as np
from libc.math cimport sqrt
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef double cosine_dist_mem_view(double[::1] A,
double[::1] B,
Py_ssize_t vec_size):
cdef double dist
cdef double A_dot_B = 0.0
cdef double A_mag = 0.0
cdef double B_mag = 0.0
cdef Py_ssize_t i
for i in range(vec_size):
A_dot_B += (A[i]*B[i])
A_mag += (A[i]*A[i])
B_mag += (B[i]*B[i])
A_mag = sqrt(A_mag)
B_mag = sqrt(B_mag)
dist = 1.0 - (A_dot_B / (A_mag * B_mag))
return dist
CPP
When using C++ there are basically two straight forward approaches if not other libraries such as LAPACK are used. The straight forward approach is either a loop over a std::vector
or a C++ array.
double cosine_dist_loop_stdvector(
std::vector<double> &vec_a,
std::vector<double> &vec_b,
size_t vec_size)
{
double a_dot_b = 0.0;
double a_mag = 0;
double b_mag = 0;
for (size_t i = 0; i < vec_size; ++i)
{
a_dot_b += (vec_a[i]*vec_b[i]);
a_mag += (vec_a[i]*vec_a[i]);
b_mag += (vec_b[i]*vec_b[i]);
}
double dist = 1.0 - (a_dot_b / (sqrt(a_mag) * sqrt(b_mag)));
return dist;
}
double cosine_dist_loop_array(
double *arr_a,
double *arr_b,
size_t arr_size)
{
double a_dot_b = 0.0;
double a_mag = 0;
double b_mag = 0;
for (size_t i = 0; i < arr_size; ++i)
{
a_dot_b += (arr_a[i]*arr_b[i]);
a_mag += (arr_a[i]*arr_a[i]);
b_mag += (arr_b[i]*arr_b[i]);
}
double dist = 1.0 - (a_dot_b / (sqrt(a_mag) * sqrt(b_mag)));
return dist;
}
Rust
The Rust version of this is similar to the C++ implementation.
fn cosine_dist_rust_loop_vec(vec_a: &[f64],
vec_b: &[f64], vec_size: &i64) -> f64
{
let mut a_dot_b:f64 = 0.0;
let mut a_mag:f64 = 0.0;
let mut b_mag:f64 = 0.0;
for i in 0..*vec_size as usize
{
a_dot_b += vec_a[i] * vec_b[i];
a_mag += vec_a[i] * vec_a[i];
b_mag += vec_b[i] * vec_b[i];
}
let dist:f64 = 1.0 - (a_dot_b / (a_mag.sqrt() * b_mag.sqrt()));
return dist
}
Benchmark Results

When it comes to using Python, then it is hard to come up with something that is slower than the SciPy implementation when calculating cosine distance of two vectors on a CPU. The Cython memory view approach seems to outperform all other implementations but has some scaling issues when vectors are getting to large. Using a GPU (RTX 3090) with PyTorch and CuPy shows a significantly lower performance with respect to small vectors but seem to scale very well across various vector sizes.
Using C++ or Rust scales better until a certain threshold is reached (vector sizes 512 to 1024). Since none of the C++ or Rust versions is really optimized we can assume that Cython compiles to C code that is slightly more optimized when it comes to comparing larger vectors.