Write the policy gradient theorem equation

E[\nabla_\theta J(\theta)] = \mathbb{E}[\nabla_\theta \log \pi_\theta(a|s)]

Write the policy gradient theorem equation

E[\nabla_\theta J(\theta)] = \mathbb{E}[\nabla_\theta \log \pi_\theta(a|s)]

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews