mgkit.utils.dictionary module¶
Dictionary utils
-
class
mgkit.utils.dictionary.
HDFDict
(file_name, table, cast=<class 'int'>, cache=True)[source]¶ Bases:
object
Changed in version 0.3.3: added cache in __init__
New in version 0.3.1.
Used a table in a HDFStore (from pandas) as a dictionary. The table must be indexed to perform well. Read only.
Note
the dictionary cannot be modified and exception:ValueError will be raised if the table is not in the file
-
mgkit.utils.dictionary.
apply_func_to_values
(dictionary, func)[source]¶ New in version 0.1.12.
Assuming a dictionary whose values are iterables, func is applied to each element of the iterable, retuning a set of all transformed elements.
-
class
mgkit.utils.dictionary.
cache_dict_file
(iterator, skip_lines=0)[source]¶ Bases:
object
New in version 0.3.0.
Used to cache the result of a function that yields a tuple (key and value). If the value is found in the internal dictionary (as the class behave), the correspondent value is returned, otherwise the iterator is advanced until the key is found.
Example
>>> from mgkit.io.blast import parse_accession_taxa_table >>> i = parse_accession_taxa_table('nucl_gb.accession2taxid.gz', key=0) >>> d = cache_dict_file(i) >>> d['AH001684'] 4400
-
mgkit.utils.dictionary.
combine_dict
(keydict, valuedict)[source]¶ Combine two dictionaries when the values of keydict are iterables. The combined dictionary has the same keys as keydict and the its values are sets containing all the values associated to keydict values in valuedict.
Resulting dictionary will be
-
mgkit.utils.dictionary.
combine_dict_one_value
(keydict, valuedict)[source]¶ Combine two dictionaries by the value of the keydict is used as a key in valuedict and the resulting dictionary is composed of keydict keys and valuedict values.
Same as
comb_dict()
, but each value in keydict is a single element that is key in valuedict.
-
mgkit.utils.dictionary.
dict_to_text
(stream, dictionary, header=None, comment=None, sep='\t')[source]¶ New in version 0.4.4.
Writes the content of a dictionary to a stream (supports write), like io.StringIO or an opened file. Intended to be used only for dictionaries with key-value of type integer/strings, other data types are better served by more complex options, like JSON, etc.
Warning
The file is expected to be opened in text mode (‘r’)
- Parameters
-
mgkit.utils.dictionary.
filter_nan
(ratios)[source]¶ Returns a dictionary with the NaN values taken out
-
mgkit.utils.dictionary.
filter_ratios_by_numbers
(ratios, min_num)[source]¶ Returns from a dictionary only the items for which the length of the iterables that is the value of the item, is equal or greater of min_num.
-
mgkit.utils.dictionary.
find_id_in_dict
(s_id, s_dict)[source]¶ Finds a value ‘s_id’ in a dictionary in which the values are iterables. Returns a list of keys that contain the value.
-
mgkit.utils.dictionary.
link_ids
(id_map, black_list=None)[source]¶ Given a dictionary whose values (iterables) can be linked back to other keys, it returns a dictionary in which the keys are the original keys and the values are sets of keys to which they can be linked.
Becomes:
- Parameters
id_map (dict) – dictionary of keys to link
black_list (iterable) – iterable of values to skip in making the links
- Return dict
linked dictionary
-
mgkit.utils.dictionary.
merge_dictionaries
(dicts)[source]¶ New in version 0.3.1.
Merges keys and values from a list/iterable of dictionaries. The resulting dictionary’s values are converted into sets, with the assumption that the values are one of the following: float, str, int, bool
-
mgkit.utils.dictionary.
reverse_mapping
(map_dict)[source]¶ Given a dictionary in the form: key->[v1, v2, .., vN], returns a dictionary in the form: v1->[key1, key2, .., keyN]
- Parameters
map_dict (dict) – dictionary to reverse
- Return dict
reversed dictionary
-
mgkit.utils.dictionary.
split_dictionary_by_value
(value_dict, threshold, aggr_func=<function median>, key_filter=None)[source]¶ Splits a dictionary, whose values are iterables, based on a threshold:
one in which the result of aggr_func is lower than the threshold (first)
one in which the result of aggr_func is equal or greater than the threshold (second)
- Parameters
valuedict (dict) – dictionary to be splitted
threshold (number) – must be comparable to threshold
aggr_func (func) – function used to aggregate the dictionary values
key_filter (iterable) – if specified, only these key will be in the resulting dictionary
- Returns
two dictionaries
-
mgkit.utils.dictionary.
text_to_dict
(stream, skip_lines=0, sep='\t', key_index=0, value_index=1, key_func=<class 'str'>, value_func=<class 'str'>, encoding=None, skip_empty=False, skip_comment=None, verbose=False)[source]¶ New in version 0.4.4.
Changed in version 0.5.5: added skip_comment and skip_empty
Reads a dictionary form a table file, the passed file is assumed to be opened as text, not binary - in which case you need to pass the encoding (e.g. ascii). The file may have multiple columns, so the key and value columns can be chosen with key_index and value_index, respectively.
- Parameters
stream (file) – stream that can be read as a file
skip_lines (int) – number of lines to skip at the start of the file
sep (str) – column separator to use
key_index (int) – zero-based column number of keys
value_index (int) – zero-based column number of values
key_func (func) – function to apply to the keys (defaults to str)
value_func (func) – function to apply to the values (defaults to str)
encoding (None, str) – if None is passed, the file is assumed to be opened in text mode, otherwise the encoding of the file must be passed
skip_empty (bool) – if True, an empty value will not be yielded
skip_comment (None, str) – if a value other than None is passed, lines starting with this parameter value will be skipped
verbose (bool) – if True logs informations about the file read
- Yields
tuple – the keys and values that can be passed to dict