(f * g)(t) = ∫f(τ)g(t-τ)dτ
Image: LunarLullaby, CC BY-SA 4.0, via Wikimedia Commons
(f * g)(t) = ∫f(τ)g(t-τ)dτ
Lagrangian L(x,λ) = f(x) - λg(x)
L(x,λ) = f(x) - λ(g(x) - c)
ReLU and Leaky ReLU
ReLU: f(x) = max(0, x); Leaky ReLU: f(x) = x if x > 0 else αx (α < 1)
Write the multi-head attention formula: MultiHead(Q,K,V) = Concat(head_1,...,head_h)W^O
MultiHead(Q,K,V) = Concat(head_i=MultiHeadAttention(Q,K,V)_i)W^O
Normalization (machine learning)
L2 normalization equation: x_i' = x_i / ||x||_2
Batch normalization
Batch normalization formula: Y = (X - μ) / σ * γ + β
Cosine similarity
Cosine similarity formula: cos(θ) = (A · B) / (||A|| ||B||)
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews