E[\nabla_\theta J(\theta)] = \mathbb{E}[\nabla_\theta \log \pi_\theta(a|s)]

Write the policy gradient theorem equation

E[\nabla_\theta J(\theta)] = \mathbb{E}[\nabla_\theta \log \pi_\theta(a|s)]

Related concepts

Write the equation for cross-entropy loss

H(y, p) = -Σ(y_i * log(p_i)) for all i

Write the formula for KL divergence D_KL(P||Q)

D_KL(P||Q) = Σ P(x) log(P(x)/Q(x)) for all x in the support of P

What maximum likelihood estimation does: find θ maximizing P(data|θ)

Maximizes θ to maximize the probability of observed data given θ

Mutual information I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)

Mutual information measures dependence between variables X and Y

What score matching does: learns the gradient of the log-density without normalizing

Score matching approximates log-density gradients for variational inference without normalization

What is the formula for calculating the mutual information between two discrete random variables X and Y?

I(X;Y) = ∑∑ P(x,y) log(P(x,y)/(P(x)P(y)))

Swipe through 100 ML concepts daily