e698b32984f3fea15ef6827a9279cb346fc9227b,slm_lab/agent/algorithm/sac.py,SoftActorCritic,train,#SoftActorCritic#,118

Before Change


        if self.to_train == 1:
            batch = self.sample()
            clock.set_batch_size(len(batch))
            pdparams, v_preds = self.calc_pdparam_v(batch)
            advs, v_targets = self.calc_advs_v_targets(batch, v_preds)
            policy_loss = self.calc_policy_loss(batch, pdparams, advs)  // from actor

After Change


            // forward passes for losses
            states = batch["states"]
            actions = batch["actions"]
            v_preds = self.calc_v(states, net=self.critic_net)
            q1_preds = self.calc_q(states, actions, self.q1_net)
            q2_preds = self.calc_q(states, actions, self.q2_net)
            pdparams = self.calc_pdparam(states)
            action_pd = policy_util.init_action_pd(self.body.ActionPD, pdparams)
Italian Trulli
In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 3

Instances


Project Name: kengz/SLM-Lab
Commit Name: e698b32984f3fea15ef6827a9279cb346fc9227b
Time: 2019-07-31
Author: kengzwl@gmail.com
File Name: slm_lab/agent/algorithm/sac.py
Class Name: SoftActorCritic
Method Name: train


Project Name: kengz/SLM-Lab
Commit Name: bdf66a650ea78167fb50f7b72cc75338d5f2bc87
Time: 2020-03-20
Author: kengzwl@gmail.com
File Name: slm_lab/agent/algorithm/ppo.py
Class Name: PPO
Method Name: train


Project Name: kengz/SLM-Lab
Commit Name: 51975a8639d0b83544ec2f932567656b25bfc965
Time: 2018-09-02
Author: lgraesser@users.noreply.github.com
File Name: slm_lab/agent/algorithm/actor_critic.py
Class Name: ActorCritic
Method Name: calc_nstep_advs_v_targets