Fun fact: cosine similarity's first use in recommendation systems to recommend usenet groups.
(https://dl.acm.org/doi/epdf/10.1145/192844.192905 although they don't call it cosine similarity; they do compute a "correlation coefficient" between two people by adding together the products of scores each gave to a post)
The Pearson correlation coefficient is covariance normalised to the range [-1, 1] by dividing with the standard deviations (https://en.wikipedia.org/wiki/Pearson_correlation_coefficien...). So not quite same as the normalised scalar product, even though the formulas look related.
Pearson correlation = cosine of the angle between centered random variables. Finite-variance centered random variables form a Hilbert space so it’s not a coincedence. Standard deviation is the length of the random variable as a vector in that space.
That makes sense; I don't actually know much about this.
That being said, weirdly, the normalization by standard deviation happens outside the call to `cov` in the paper (page 181, column 1, equations (unnumbered) 1 and 2). And in equation 2 they've expanded `cov` to be the sum of pointwise multiplication of the (scores - average score) people have given to posts.
Again, not my area of expertise, just looking at the math here.
The dot product is computed between two vectors.
For these use cases that dot product is equal to the cosine of the angle between these angles.
(Strictly speaking we have that the angle is actually defined in terms of the dot/inner product in more abstract spaces like function spaces or L^p/l^p)
It's grounded in basic trigonometry, i.e. it calculates the angle `theta` between two entities/vectors, `a` and `b`. If `theta` is close to 180 degrees, cos(theta) is -1, and cosine similarity dictates these are opposite concepts, i.e. unrelated.
(https://dl.acm.org/doi/epdf/10.1145/192844.192905 although they don't call it cosine similarity; they do compute a "correlation coefficient" between two people by adding together the products of scores each gave to a post)