ad44e9a2219e4508a68ba3e831d36b9ca6a324ee,examples/grid_world.py,,experiment,#,13

Before Change


    np.random.seed()

    // MDP
    mdp = GridWorld(height=3, width=3, goal=(2, 2))

    // Policy
    epsilon = Parameter(value=1)
    discrete_actions = mdp.action_space.values
    pi = EpsGreedy(epsilon=epsilon, discrete_actions=discrete_actions)

    // Approximator
    approximator_params = dict(shape=(3, 3, mdp.action_space.n))

After Change


    core.learn(n_iterations=1, how_many=500000, n_fit_steps=1,
               iterate_over="samples")

    _, _, reward, _, _, _ = parse_dataset(core.get_dataset(),
                                          mdp.observation_space.dim,
                                          mdp.action_space.dim)
    return reward

if __name__ == "__main__":
    n_experiment = 2

In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 3

Instances

Link

Project Name: AIRLab-POLIMI/mushroom

Commit Name: ad44e9a2219e4508a68ba3e831d36b9ca6a324ee

Time: 2017-06-02

Author: carlo.deramo@gmail.com

File Name: examples/grid_world.py

Class Name:

Method Name: experiment

Link

Project Name: AIRLab-POLIMI/mushroom

Commit Name: b2aad723220e31bc8e950c112b557732b608b97a

Time: 2017-06-04

Author: carlo.deramo@gmail.com

File Name: PyPi/algorithms/td.py

Class Name: DoubleQLearning

Method Name: fit

Link

Project Name: AIRLab-POLIMI/mushroom

Commit Name: b2aad723220e31bc8e950c112b557732b608b97a

Time: 2017-06-04

Author: carlo.deramo@gmail.com

File Name: PyPi/algorithms/td.py

Class Name: TD

Method Name: fit