Trains on convex combinations of pairs: x̃=λx_i+(1-λ)x_j represents a weighted average of two points in a convex set

What mixup does: trains on convex combinations of pairs: x̃=λx_i+(1-λ)x_j

Trains on convex combinations of pairs: x̃=λx_i+(1-λ)x_j represents a weighted average of two points in a convex set

Related concepts

Why non-convex loss landscapes are hard: many local minima and saddle points

Non-convex landscapes have numerous local minima and saddle points, complicating optimization

What weight tying does in language models: shares embedding and output projection matrices

Language models use tied weights to share embedding and output projection matrices, enhancing parameter efficiency

What AdaGrad does: divides learning rate by sqrt of sum of squared gradients

AdaGrad adapts learning rates based on historical gradients, reducing for frequently updated features

What score matching does: learns the gradient of the log-density without normalizing

Score matching approximates log-density gradients for variational inference without normalization

Write the formula for Lagrangian L(x,λ) = f(x) - λg(x)

L(x,λ) = f(x) - λ∫g(x)dx, where λ is Lagrange multiplier

How does score matching utilize the Fisher Information Matrix to learn the parameters of a probabilistic model without normalizing the score?

Score matching estimates parameters by minimizing the Kullback-Leibler divergence between empirical and model score distributions

Swipe through 100 ML concepts daily