High dimensionality dilutes data density, making nearest neighbors less distinct and search unreliable
Image: Jkatz (WMF), CC BY-SA 4.0, via Wikimedia Commons
High dimensionality dilutes data density, making nearest neighbors less distinct and search unreliable
cosine similarity works better than Euclidean distance in high dimensions
Cosine similarity measures orientation, not magnitude, making it more robust to irrelevant dimensions in high-dimensional spaces
random projection to O(log n/ε²) dimensions preserves pairwise distances within 1±ε
Random projection reduces dimensionality while preserving pairwise distances within ε² due to the Johnson-Lindenstrauss lemma
the Johnson-Lindenstrauss lemma says
Random projection reduces dimensionality while approximately preserving pairwise distances
Locality-sensitive hashing
Locality-sensitive hashing (LSH) hashes similar items into the same buckets
Manifold hypothesis
High-dimensional data lies on lower-dimensional manifolds
batch size affects generalization: larger batches find sharper minima
Larger batch sizes lead to sharper minima, enhancing generalization by providing more accurate gradient estimates
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews