digideep.environment package¶
Subpackages¶
- digideep.environment.common package
- Subpackages
- digideep.environment.common.vec_env package
- Submodules
- digideep.environment.common.vec_env.dummy_vec_env module
- digideep.environment.common.vec_env.shmem_vec_env module
- digideep.environment.common.vec_env.subproc_vec_env module
- digideep.environment.common.vec_env.util module
- digideep.environment.common.vec_env.vec_monitor module
- digideep.environment.common.vec_env.vec_video_recorder module
- Module contents
- digideep.environment.common.vec_env package
- Submodules
- digideep.environment.common.atari_wrappers module
- digideep.environment.common.monitor module
- digideep.environment.common.running_mean_std module
- digideep.environment.common.tile_images module
- Module contents
- Subpackages
- digideep.environment.dmc2gym package
Submodules¶
digideep.environment.data_helpers module¶
This module provides helper functions to manage data outputs from the Explorer
class.
-
digideep.environment.data_helpers.
complete_dict_of_list
(dic, length)[source]¶ This function will complete the missing elements of a reference dictionary with similarly-structured
None
values.>>> dic = {'a':[1,2,3,4], ... 'b':[[none,none,none],[none,none,none],[none,none,none],[1,2,3]], ... 'c':[[-1,-2],[-3,-4]]} >>> # The length of lists under each key is 4 except 'c' which is 2. We have to complete that. >>> complete_dict_of_list(dic, 4) {'a':[1,2,3,4], 'b':[[none,none,none],[none,none,none],[none,none,none],[1,2,3]], 'c':[[-1,-2],[-3,-4],[none,none],[none,none]]}
-
digideep.environment.data_helpers.
convert_time_to_batch_major
(episode)[source]¶ Converts a rollout to have the batch dimension in the major (first) dimension, instead of second dimension.
Parameters: episode (dict) – A trajectory in the form of {'key1':(num_steps,batch_size,...), 'key2':(num_steps,batch_size,...)}
Returns: A trajectory in the form of {'key1':(batch_size,num_steps,...), 'key2':(batch_size,num_steps,...)}
Return type: dict >>> episode = {'key1':[[[1],[2]], [[3],[4]], [[5],[6]], [[7],[8]], [[9],[10]]], 'key2':[[[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]], [[13,14],[15,16]], [[17,18],[19,20]]]} >>> convert_time_to_batch_major(episode) {'key1': array([[[ 1.], [ 3.], [ 5.], [ 7.], [ 9.]], [[ 2.], [ 4.], [ 6.], [ 8.], [10.]]], dtype=float32), 'key2': array([[[ 1., 2.], [ 5., 6.], [ 9., 10.], [13., 14.], [17., 18.]], [[ 3., 4.], [ 7., 8.], [11., 12.], [15., 16.], [19., 20.]]], dtype=float32)}
-
digideep.environment.data_helpers.
dict_of_lists_to_list_of_dicts
(dic, num)[source]¶ Function to convert a dict of lists to a list of dicts. Mainly used to prepare actions to be fed into the
env.step(action)
.env.step
assumes action to be in the form of a list the same length as the number of workers. It will assign the first action to the first worker and so on.Parameters: - dic (dict) – A dictionary with keys being the actions for different agents in the environment.
- num (python:int) – The number of workers.
Returns: A length with its length being same as
num
. Each element in the list would be a dictionary with keys being the agents.Return type: >>> dic = {'a1':([1,2],[3,4],[5,6]), 'a2':([9],[8],[7])} >>> num = 3 >>> dict_of_lists_to_list_of_dicts(dic, num) [{'a1':[1,2], 'a2':[9]}, {'a1':[3,4], 'a2':[8]}, {'a1':[5,6], 'a2':[7]}]
Caution
This only works for 1-level dicts, not for nested dictionaries.
-
digideep.environment.data_helpers.
extract_keywise
(dic, key)[source]¶ This function will extract a key from all entries in a dictionary. Key should be first-level key.
Parameters: - dic (dict) – The input dictionary containing a dict of dictionaries.
- key – The key name to be extracted.
Returns: The result dictionary
Return type: >>> dic = {'agent1':{'a':[1,2],'b':{'c':2,'d':4}}, 'agent2':{'a':[3,4],'b':{'c':9,'d':7}}} >>> key = 'a' >>> extract_keywise(dic, key) {'agent1':[1,2], 'agent2':[3,4]}
-
digideep.environment.data_helpers.
flatten_dict
(dic, sep='/', prefix='')[source]¶ We flatten a nested dictionary into a 1-level dictionary. In the new dictionary keys are combinations of previous keys, separated by the
sep
. We follow unix-style file system naming.>>> Dict = {"a":1, "b":{"c":1, "d":{"e":2, "f":3}}} >>> flatten_dict(Dict) {"/a":1, "/b/c":1, "/b/d/e":2, "/b/d/f":3}
-
digideep.environment.data_helpers.
join_keys
(key1, key2, sep='/')[source]¶ Parameters: >>> join_keys('/agent1','artifcats') '/agent1/artifacts'
-
digideep.environment.data_helpers.
list_of_dicts_to_flattened_dict_of_lists
(List, length)[source]¶ Function to convert a list of (nested) dicts to a flattened dict of lists. See the example below.
Parameters: - List (list) – A list of dictionaries. Each element in the list is a single sample data produced from the environment.
- length (python:int) – The length of time sequence. It is used to complete the data entries which were lacking from some data samples.
Returns: A dictionary whose keys are flattened similar to Unix-style file system naming.
Return type: >>> List = [{'a':{'f':[1,2], 'g':[7,8]}, 'b':[-1,-2], 'info':[10,20]}, {'a':{'f':[3,4], 'g':[9,8]}, 'b':[-3,-4], 'step':[80,90]}] >>> Length = 2 >>> list_of_dicts_to_flattened_dict_of_lists(List, Length) {'/a/f':[[1,2],[3,4]], '/a/g':[[7,8],[9,8]], 'b':[[-1,-2],[-3,-4]], '/info':[[10,20],[none,none]], '/step':[[none,none],[80,90]]}
# Intermediate result, before doing ``complete_dict_of_list``: {'/a/f':[[1,2],[3,4]], '/a/g':[[7,8],[9,8]], 'b':[[-1,-2],[-3,-4]], '/info':[[10,20]], '/step':[[none,none],[80,90]]} # Final result, after doing ``complete_dict_of_list`` ('/info' will become complete in length): {'/a/f':[[1,2],[3,4]], '/a/g':[[7,8],[9,8]], 'b':[[-1,-2],[-3,-4]], '/info':[[10,20],[none,none]], '/step':[[none,none],[80,90]]}
-
digideep.environment.data_helpers.
nonify
(element)[source]¶ This function creates an output with all elements being
None
. The structure of the resulting element is exactly the structure of the inputelement
. Theelement
cannot contain dicts. The only accepted types aretuple
,list
, andnp.ndarray
. It can contain nested lists and tuples, however.>>> Input = [(1,2,3), (1,2,4,5,[-1,-2])] >>> nonify(Input) [(none,none,none), (none,none,none,none,[none,none])]
-
digideep.environment.data_helpers.
unflatten_dict
(dic, sep='/')[source]¶ Unflattens a flattened dictionary into a nested dictionary.
>>> Dict = {"/a":1, "/b/c":1, "/b/d/e":2, "/b/d/f":3} >>> unflatten_dict(Dict) {"a":1, "b":{"c":1, "d":{"e":2, "f":3}}}
-
digideep.environment.data_helpers.
update_dict_of_lists
(dic, item, index=0)[source]¶ This function updates a dictionary with a new item.
>>> dic = {'a':[1,2,3], 'c':[[-1,-2],[-3,-4]]} >>> item = {'a':4, 'b':[1,2,3]} >>> index = 3 >>> update_dict_of_lists(dic, item, index) {'a':[1,2,3,4], 'b':[[none,none,none],[none,none,none],[none,none,none],[1,2,3]], 'c':[[-1,-2],[-3,-4]]}
Note
c
in the above example is not “complete” yet! The functioncomplete_dict_of_list()
will complete the keys which need to be completed!Caution
This function does not support nested dictionaries.
digideep.environment.explorer module¶
-
class
digideep.environment.explorer.
Explorer
(session, agents=None, **params)[source]¶ Bases:
object
A class which runs environments in parallel and returns the result trajectories in a unified structure. It support multi-agents in an environment.
Note
The entrypoint of this class is the
update()
function, in which thestep()
function will be called forn_steps
times. In thestep()
function, theprestep()
function is called first to get the actions from the agents. Then theenv.step
function is called to execute those actions in the environments. After the loop is done in theupdate()
, we do anotherprestep()
to save theobservations
/actions
of the last step. This indicates the final action that the agent would take without actually executing that. This information will be useful in some algorithms.Parameters: - session (
Session
) – The running session object. - agents (dict) – A dictionary of the agents and their corresponding agent objects.
- mode (str) – The mode of the Explorer, which is any of the three:
train
|test
|eval
- env (
env
) – The parameters of the environment. - do_reset (bool) – A flag indicating whether to reset the environment at the update start.
- final_action (bool) – A flag indicating whether in the final call of
prestep()
the action should also be generated or not. - num_workers (python:int) – Number of workers to work in parallel.
- deterministic (bool) – Whether to choose the optimial action or to mix some noise with the action (i.e. for exploration).
- n_steps (python:int) – Number of steps to take in the
update()
. - render (bool) – A flag used to indicate whether environment should be rendered at each step.
- render_delay (python:float) – The amount of seconds to wait after calling
env.render
. Used when environment is too fast for visualization, typically ineval
mode. - seed (python:int) – The environment seed.
Variables: - steps (python:int) – Number of times the
step()
function is called. - n_episode (python:int) – Number of episodes (a full round of simulation) generated so far.
- timesteps (python:int) – Number of total timesteps of experience generated so far.
- was_reset (bool) – A flag indicating whether the Explorer has been just reset or not.
- observations – A tracker of environment observations used to produce the actions for the next step.
- masks – A tracker of environment
done
flag indicating the start of a new episode. - hidden_states – A tracker of hidden_states of the agents for producing the next step action in recurrent policies.
Caution
Use
do_reset
with caution; only when you know what the consequences are. Generally there are few oportunities when this flag needs to be true.Tip
This class is partially serializable. It only saves the state of environment wrappers and not the environment per se.
-
prestep
(final_step=False)[source]¶ Function to produce actions for all of the agents. This function does not execute the actions in the environment.
Parameters: final_step (bool) – A flag indicating whether this is the last call of this function. Returns: The pre-transition dictionary containing observations, masks, and agents informations. The format is like: {"observations":..., "masks":..., "agents":...}
Return type: dict
-
report_rewards
(infos)[source]¶ This function will extract episode information from infos and will send them to
Monitor
class.
-
reset
()[source]¶ Will reset the Explorer and all of its states. Will set
was_reset
toTrue
to prevent immediate resets.
-
step
()[source]¶ Function that runs the
prestep
and the actualenv.step
functions. It will also manipulate the transition data to be in appropriate format.Returns: The full transition information, including the pre-transition (actions, last observations, etc) and the results of executing actions on the environments, i.e. rewards and infos. The format is like: {"observations":..., "masks":..., "rewards":..., "infos":..., "agents":...}
Return type: dict
- session (
digideep.environment.make_environment module¶
This module is inspired by pytorch-a2c-ppo-acktr.
-
class
digideep.environment.make_environment.
MakeEnvironment
(session, mode, seed, **params)[source]¶ Bases:
object
This class will make the environment. It will apply the wrappers to the environments as well.
Tip
Except
Monitor
environment, no environment will be applied on the environment unless explicitly specified.-
get_config
()[source]¶ This function will generate a dict of interesting specifications of the environment.
Note: Observation and action can be nested spaces.Dict.
-
registered
= False¶
-