5bb63d23a7722d060f7df345a9d2b51dc98c035a,ch09/02_cartpole_reinforce.py,,,#,34
Before Change
step_rewards = step_rewards[-1000:]
baseline = np.mean(step_rewards)
writer.add_scalar("baseline", baseline, step_idx)
batch_states.append(exp.state)
batch_actions.append(int(exp.action))
batch_scales.append(exp.reward - baseline)
After Change
batch_episodes = 0
batch_states, batch_actions, batch_qvals = [], [], []
cur_states, cur_actions, cur_rewards = [], [], []
for step_idx, exp in enumerate(exp_source):
cur_states.append(exp.state)
cur_actions.append(int(exp.action))
In pattern: SUPERPATTERN
Frequency: 3
Non-data size: 3
Instances
Project Name: PacktPublishing/Deep-Reinforcement-Learning-Hands-On
Commit Name: 5bb63d23a7722d060f7df345a9d2b51dc98c035a
Time: 2017-12-03
Author: max.lapan@gmail.com
File Name: ch09/02_cartpole_reinforce.py
Class Name:
Method Name:
Project Name: PacktPublishing/Deep-Reinforcement-Learning-Hands-On
Commit Name: 8acf099847ebf73ad8cdae1341d0f768dbe1c094
Time: 2017-12-04
Author: max.lapan@gmail.com
File Name: ch09/04_pong_pg.py
Class Name:
Method Name:
Project Name: lufficc/SSD
Commit Name: 94a995defe223eed0898f25d2332ba6178a92abe
Time: 2018-12-19
Author: luffy.lcc@gmail.com
File Name: ssd/engine/trainer.py
Class Name:
Method Name: do_train