Cosine similarity formula: cos(θ) = (A · B) / (||A|| ||B||)
Image: Tupperware Ltd., CC BY-SA 3.0, via Wikimedia Commons
Cosine similarity formula: cos(θ) = (A · B) / (||A|| ||B||)
Cosine similarity measures the cosine of the angle between two vectors, which is calculated as the dot product of the vectors divided by the product of their lengths. This metric is useful for determining the similarity between vectors without being affected by their magnitudes.
For instance, if vector A = [1, 2] and vector B = [2, 4], the dot product A · B = 1*2 + 2*4 = 10. The lengths ||A|| = √(1² + 2²) = √5 and ||B|| = √(2² + 4²) = √20. Thus, cosine similarity = 10 / (√5 * √20) = 10 / √100 = 1.
Understanding cosine similarity is crucial in fields like information retrieval and text mining, where it helps compare documents represented as vectors of word occurrences.
Cosine similarity is a fundamental concept in vector space models used in various applications, including search engines and recommendation systems.
List of algorithms
Cosine similarity measures the angle between vectors, not their magnitude
Covariance matrix
Covariance formula: Cov(X, Y) = E[(X - E[X])(Y - E[Y])]
Dot product
Dot product = sum of products of corresponding entries
Euclidean distance
Euclidean distance formula: √((x2 - x1)² + (y2 - y1)²)
the dot product measures alignment: it equals |a||b|cos(θ)
Dot product measures alignment: it equals |a||b|cos(θ)
cosine similarity works better than Euclidean distance in high dimensions
Cosine similarity measures orientation, not magnitude, making it more robust to irrelevant dimensions in high-dimensional spaces
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews