E[\nabla_\theta J(\theta)] = \mathbb{E}[\nabla_\theta \log \pi_\theta(a|s)]
E[\nabla_\theta J(\theta)] = \mathbb{E}[\nabla_\theta \log \pi_\theta(a|s)]
Write the equation for cross-entropy loss
H(y, p) = -Σ(y_i * log(p_i)) for all i
Write the formula for KL divergence D_KL(P||Q)
D_KL(P||Q) = Σ P(x) log(P(x)/Q(x)) for all x in the support of P
What maximum likelihood estimation does: find θ maximizing P(data|θ)
Maximizes θ to maximize the probability of observed data given θ
Mutual information I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)
Mutual information measures dependence between variables X and Y
What score matching does: learns the gradient of the log-density without normalizing
Score matching approximates log-density gradients for variational inference without normalization
What is the formula for calculating the mutual information between two discrete random variables X and Y?
I(X;Y) = ∑∑ P(x,y) log(P(x,y)/(P(x)P(y)))
One email a day: 5 concepts + the 5 stories that matter →
Swipe through 100 ML concepts daily
Open TickerNews