eb110220d9a39a294479433cefc274e42506737e,main.py,,main,#,75

Before Change


            returns[step] = returns[step + 1] * \
                args.gamma * masks[step] + rewards[step]

        value_loss = (values[:-1] - Variable(returns[:-1])).pow(2).mean()

        advantages = returns[:-1] - values[:-1].data
        action_loss = -(Variable(advantages) * action_log_probs).mean()

After Change


        dist_entropy = -(log_probs * probs).sum(-1).mean()

        advantages = Variable(returns[:-1]) - values
        value_loss = advantages.pow(2).mean()

        action_loss = -(Variable(advantages.data) * action_log_probs).mean()

        optimizer.zero_grad()
Italian Trulli
In pattern: SUPERPATTERN

Frequency: 4

Non-data size: 2

Instances


Project Name: ikostrikov/pytorch-a2c-ppo-acktr
Commit Name: eb110220d9a39a294479433cefc274e42506737e
Time: 2017-09-16
Author: ikostrikov@gmail.com
File Name: main.py
Class Name:
Method Name: main


Project Name: ray-project/ray
Commit Name: b7dbbfbf4111698145bb9e0bf2e34e36fef0430c
Time: 2020-11-25
Author: sven@anyscale.io
File Name: rllib/agents/sac/sac_torch_policy.py
Class Name:
Method Name: actor_critic_loss


Project Name: lcswillems/torch-rl
Commit Name: cd1da3fc9c5b68a9e583b8c26e5cdae58560be6d
Time: 2018-04-19
Author: lcswillems@gmail.com
File Name: torch_ac/algos/ppo.py
Class Name: PPOAlgo
Method Name: update_parameters


Project Name: jindongwang/transferlearning
Commit Name: e0b38194a6b8b53a0a7380c02466e052eff66ad0
Time: 2019-04-15
Author: jindongwang@outlook.com
File Name: code/distance/coral_loss.py
Class Name:
Method Name: CORAL_loss