Tools

This module contains tools for processing and dealing with some data liaised to this framework.

Created on Sat Feb 22, 2020

@author: Jorge Mario Cruz-Duarte (jcrvz.github.io), e-mail: j.m.cruzduarte@ieee.org

Utility functions used across the CUSTOMHyS package: JSON I/O, statistical summaries, data inspection helpers, and more.

Quick usage:

from customhys import tools as tl

# Read a JSON results file
data = tl.read_json("data_files/raw/results.json")

# Inspect the data structure
tl.printmsk(data)

customhys.tools.printmsk(var, level=1, name=None)[source]

Print the meta-skeleton of a variable with nested variables, all with different types.

Example:

>>> variable = {"par0": [1, 2, 3, 4, 5, 6],
        "par1": [1, 'val1', 1.23],
        "par2" : -4.5,
        "par3": "val2",
        "par4": [7.8, [-9.10, -11.12, 13.14, -15.16]],
        "par5": {"subpar1": 7,
                 "subpar2": (8, 9, [10, 11])}}

>>> printmsk(variable)
|-- {dict: 6}
|  |-- par0 = {list: 6}
|  |  |-- 0 = {int}
:  :  :
|  |-- par1 = {list: 3}
|  |  |-- 0 = {int}
|  |  |-- 1 = {str}
|  |  |-- 2 = {float}
|  |-- par2 = {float}
|  |-- par3 = {str}
|  |-- par4 = {list: 2}
|  |  |-- 0 = {float}
|  |  |-- 1 = {list: 4}
|  |  |  |-- 0 = {float}
:  :  :  :
|  |-- par5 = {dict: 2}
|  |  |-- subpar1 = {int}
|  |  |-- subpar2 = {tuple: 3}
|  |  |  |-- 0 = {int}
|  |  |  |-- 1 = {int}
|  |  |  |-- 2 = {list: 2}
|  |  |  |  |-- 0 = {int}
:  :  :  :  :

Parameters:

var (any) – Variable to inspect.
level (int) – Optional. Level of the variable to inspect. Default: 1.
name – Optional. Name of the variable to inspect. It is just for decorative purposes. The default is None.

Returns:

None.

customhys.tools.listfind(values, val)[source]

Return all indices of a list corresponding to a value.

Parameters:

values (list) – List to analyse.
val (any) – Element to find in the list.

Returns:

list

customhys.tools.revise_results(main_folder='data_files/raw/')[source]

Revise a folder with subfolders and check if there are subfolder repeated, in name, then merge. The repeated folders are renamed by adding the prefix ‘.to_delete-’, but before merge their data into a unique folder.

Parameters:: main_folder (str) – Optional. Path to analyse. The default is ‘data_files/raw/’.
Returns:: None

customhys.tools.read_folder_files(folder_name)[source]: Return a list of all subfolders contained in a folder, ignoring all those starting with ‘.’ (hidden ones). :param str folder_name: Name of the main folder. :return: list.

customhys.tools.preprocess_files(main_folder='data_files/raw/', kind='brute_force', only_laststep=True, output_name='processed_data', experiment='')[source]

Return data from results saved in the main folder. This method save the summary file in json format. Take in account that output_name = 'brute_force' has a special behaviour due to each json file stored in sub-folders correspond to a specific operator. Otherwise, these files use to correspond to a candidate solution (i.e., a metaheuristic) from the hyper-heuristic process. :param str main_folder: Optional.

Location of the main folder. The default is ‘data_files/raw/’.

Parameters:

kind (str) – Type of procedure run to obtain the data files. They can be ‘brute_force’, ‘basic_metaheuristic’, and any other, which means metaheuristics without fixed search operators. The default is ‘brute_force’.
only_laststep (bool) –
Optional. Flag for only save the last step of all fitness values from the historical data. It is useful for large amount

of experiments. It only works when ``kind’’ is neither ‘brute_force’ or ‘basic_metaheuristic’. The default is True.
output_name (str) – Name of the resulting file. The default is ‘processed_data’.
experiment (str) – Label of the experiment. This parameter help to filter the results if multiple experiments are performed at the same time. Default is an empty string, which would process all the results from the given folder.

Returns:

dict.

customhys.tools.df2dict(df)[source]: Return a dictionary from a Pandas.dataframe. :param pandas.DataFrame df: Pandas’ DataFrame. :return: dict.

customhys.tools.check_fields(default_dict, new_dict)[source]

Return the dictionary with default keys and values updated by using the information of new_dict :param dict default_dict:

Dictionary with default values.

Parameters:

new_dict (dict) – Dictionary with new values.
default_dict (dict)

Returns:

dict.

customhys.tools.save_json(variable_to_save, file_name=None, suffix=None)[source]

Save a variable composed with diverse types of variables, like numpy. :param any variable_to_save:

Variable to save.

Parameters:

file_name (str) – Optional. Filename to save the variable. If this is None, a random name is used. The default is None.
suffix (str) – Optional. Prefix to put in the file_name. The default is None.

Returns:

None.

customhys.tools.read_json(data_file)[source]

Return data from a json file. :param str data_file:

Filename of the json file.

Returns:: dict or list.

customhys.tools.merge_json(data_folder, list_of_fields=None, save_file=True)[source]

Parameters:

data_folder (str)
list_of_fields (list | None)
save_file (bool)

Return type:

None

class customhys.tools.NumpyEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

Numpy encoder

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)