digideep.environment package


digideep.environment.data_helpers module

This module provides helper functions to manage data outputs from the Explorer class.

digideep.environment.data_helpers.complete_dict_of_list(dic, length)[source]

This function will complete the missing elements of a reference dictionary with similarly-structured None values.

>>> dic = {'a':[1,2,3,4],
...        'b':[[none,none,none],[none,none,none],[none,none,none],[1,2,3]],
...        'c':[[-1,-2],[-3,-4]]}
>>> # The length of lists under each key is 4 except 'c' which is 2. We have to complete that.
>>> complete_dict_of_list(dic, 4)

Converts a rollout to have the batch dimension in the major (first) dimension, instead of second dimension.

Parameters:episode (dict) – A trajectory in the form of {'key1':(num_steps,batch_size,...), 'key2':(num_steps,batch_size,...)}
Returns:A trajectory in the form of {'key1':(batch_size,num_steps,...), 'key2':(batch_size,num_steps,...)}
Return type:dict
>>> episode = {'key1':[[[1],[2]], [[3],[4]], [[5],[6]], [[7],[8]], [[9],[10]]],
                'key2':[[[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]], [[13,14],[15,16]], [[17,18],[19,20]]]}
>>> convert_time_to_batch_major(episode)
{'key1': array([[[ 1.],
    [ 3.],
    [ 5.],
    [ 7.],
    [ 9.]],

    [[ 2.],
    [ 4.],
    [ 6.],
    [ 8.],
    [10.]]], dtype=float32), 'key2': array([[[ 1.,  2.],
    [ 5.,  6.],
    [ 9., 10.],
    [13., 14.],
    [17., 18.]],

    [[ 3.,  4.],
    [ 7.,  8.],
    [11., 12.],
    [15., 16.],
    [19., 20.]]], dtype=float32)}
digideep.environment.data_helpers.dict_of_lists_to_list_of_dicts(dic, num)[source]

Function to convert a dict of lists to a list of dicts. Mainly used to prepare actions to be fed into the env.step(action). env.step assumes action to be in the form of a list the same length as the number of workers. It will assign the first action to the first worker and so on.

  • dic (dict) – A dictionary with keys being the actions for different agents in the environment.
  • num (python:int) – The number of workers.

A length with its length being same as num. Each element in the list would be a dictionary with keys being the agents.

Return type:


>>> dic = {'a1':([1,2],[3,4],[5,6]), 'a2':([9],[8],[7])}
>>> num = 3
>>> dict_of_lists_to_list_of_dicts(dic, num)
[{'a1':[1,2], 'a2':[9]}, {'a1':[3,4], 'a2':[8]}, {'a1':[5,6], 'a2':[7]}]


This only works for 1-level dicts, not for nested dictionaries.

digideep.environment.data_helpers.extract_keywise(dic, key)[source]

This function will extract a key from all entries in a dictionary. Key should be first-level key.

  • dic (dict) – The input dictionary containing a dict of dictionaries.
  • key – The key name to be extracted.

The result dictionary

Return type:


>>> dic = {'agent1':{'a':[1,2],'b':{'c':2,'d':4}}, 'agent2':{'a':[3,4],'b':{'c':9,'d':7}}}
>>> key = 'a'
>>> extract_keywise(dic, key)
{'agent1':[1,2], 'agent2':[3,4]}
digideep.environment.data_helpers.flatten_dict(dic, sep='/', prefix='')[source]

We flatten a nested dictionary into a 1-level dictionary. In the new dictionary keys are combinations of previous keys, separated by the sep. We follow unix-style file system naming.

>>> Dict = {"a":1, "b":{"c":1, "d":{"e":2, "f":3}}}
>>> flatten_dict(Dict)
{"/a":1, "/b/c":1, "/b/d/e":2, "/b/d/f":3}
digideep.environment.data_helpers.join_keys(key1, key2, sep='/')[source]
  • key1 (str) – The first key in unix-style file system path.
  • key1 – The second key in unix-style file system path.
  • sep (str) – The separator to be used.
>>> join_keys('/agent1','artifcats')
digideep.environment.data_helpers.list_of_dicts_to_flattened_dict_of_lists(List, length)[source]

Function to convert a list of (nested) dicts to a flattened dict of lists. See the example below.

  • List (list) – A list of dictionaries. Each element in the list is a single sample data produced from the environment.
  • length (python:int) – The length of time sequence. It is used to complete the data entries which were lacking from some data samples.

A dictionary whose keys are flattened similar to Unix-style file system naming.

Return type:


>>> List = [{'a':{'f':[1,2], 'g':[7,8]}, 'b':[-1,-2], 'info':[10,20]},
            {'a':{'f':[3,4], 'g':[9,8]}, 'b':[-3,-4], 'step':[80,90]}]
>>> Length = 2
>>> list_of_dicts_to_flattened_dict_of_lists(List, Length)
# Intermediate result, before doing ``complete_dict_of_list``:
# Final result, after doing ``complete_dict_of_list`` ('/info' will become complete in length):

This function creates an output with all elements being None. The structure of the resulting element is exactly the structure of the input element. The element cannot contain dicts. The only accepted types are tuple, list, and np.ndarray. It can contain nested lists and tuples, however.

>>> Input = [(1,2,3), (1,2,4,5,[-1,-2])]
>>> nonify(Input)
[(none,none,none), (none,none,none,none,[none,none])]
digideep.environment.data_helpers.unflatten_dict(dic, sep='/')[source]

Unflattens a flattened dictionary into a nested dictionary.

>>> Dict = {"/a":1, "/b/c":1, "/b/d/e":2, "/b/d/f":3}
>>> unflatten_dict(Dict)
{"a":1, "b":{"c":1, "d":{"e":2, "f":3}}}
digideep.environment.data_helpers.update_dict_of_lists(dic, item, index=0)[source]

This function updates a dictionary with a new item.

>>> dic = {'a':[1,2,3], 'c':[[-1,-2],[-3,-4]]}
>>> item = {'a':4, 'b':[1,2,3]}
>>> index = 3
>>> update_dict_of_lists(dic, item, index)


c in the above example is not “complete” yet! The function complete_dict_of_list() will complete the keys which need to be completed!


This function does not support nested dictionaries.

digideep.environment.explorer module

class digideep.environment.explorer.Explorer(session, agents=None, **params)[source]

Bases: object

A class which runs environments in parallel and returns the result trajectories in a unified structure. It support multi-agents in an environment.


The entrypoint of this class is the update() function, in which the step() function will be called for n_steps times. In the step() function, the prestep() function is called first to get the actions from the agents. Then the env.step function is called to execute those actions in the environments. After the loop is done in the update(), we do another prestep() to save the observations/actions of the last step. This indicates the final action that the agent would take without actually executing that. This information will be useful in some algorithms.

  • session (Session) – The running session object.
  • agents (dict) – A dictionary of the agents and their corresponding agent objects.
  • mode (str) – The mode of the Explorer, which is any of the three: train | test | eval
  • env (env) – The parameters of the environment.
  • do_reset (bool) – A flag indicating whether to reset the environment at the update start.
  • final_action (bool) – A flag indicating whether in the final call of prestep() the action should also be generated or not.
  • num_workers (python:int) – Number of workers to work in parallel.
  • deterministic (bool) – Whether to choose the optimial action or to mix some noise with the action (i.e. for exploration).
  • n_steps (python:int) – Number of steps to take in the update().
  • render (bool) – A flag used to indicate whether environment should be rendered at each step.
  • render_delay (python:float) – The amount of seconds to wait after calling env.render. Used when environment is too fast for visualization, typically in eval mode.
  • seed (python:int) – The environment seed.
  • steps (python:int) – Number of times the step() function is called.
  • n_episode (python:int) – Number of episodes (a full round of simulation) generated so far.
  • timesteps (python:int) – Number of total timesteps of experience generated so far.
  • was_reset (bool) – A flag indicating whether the Explorer has been just reset or not.
  • observations – A tracker of environment observations used to produce the actions for the next step.
  • masks – A tracker of environment done flag indicating the start of a new episode.
  • hidden_states – A tracker of hidden_states of the agents for producing the next step action in recurrent policies.


Use do_reset with caution; only when you know what the consequences are. Generally there are few oportunities when this flag needs to be true.


This class is partially serializable. It only saves the state of environment wrappers and not the environment per se.


It closes all environments.


Function to produce actions for all of the agents. This function does not execute the actions in the environment.

Parameters:final_step (bool) – A flag indicating whether this is the last call of this function.
Returns:The pre-transition dictionary containing observations, masks, and agents informations. The format is like: {"observations":..., "masks":..., "agents":...}
Return type:dict

This function will extract episode information from infos and will send them to Monitor class.


Will reset the Explorer and all of its states. Will set was_reset to True to prevent immediate resets.


Function that runs the prestep and the actual env.step functions. It will also manipulate the transition data to be in appropriate format.

Returns:The full transition information, including the pre-transition (actions, last observations, etc) and the results of executing actions on the environments, i.e. rewards and infos. The format is like: {"observations":..., "masks":..., "rewards":..., "infos":..., "agents":...}
Return type:dict

Runs step() for n_steps times.

Returns:A dictionary of unix-stype file system keys including all information generated by the simulation.
Return type:dict

digideep.environment.make_environment module

This module is inspired by pytorch-a2c-ppo-acktr.

class digideep.environment.make_environment.MakeEnvironment(session, mode, seed, **params)[source]

Bases: object

This class will make the environment. It will apply the wrappers to the environments as well.


Except Monitor environment, no environment will be applied on the environment unless explicitly specified.

create_envs(num_workers=1, force_no_monitor=False, extra_env_kwargs={})[source]

This function will generate a dict of interesting specifications of the environment.

Note: Observation and action can be nested spaces.Dict.

make_env(rank, force_no_monitor=False, extra_env_kwargs={})[source]
registered = False
run_wrapper_stack(env, stack)[source]

Apply a series of wrappers.


Function to convert space’s characteristics into a config-space dict.

digideep.environment.play module

digideep.environment.wrappers module

Module contents