mgkit.mappings.utils module

Utilities to map genes

mgkit.mappings.utils.count_genes_in_mapping(gene_lists, labels, mapping, normalise=False)[source]

Maps lists of ids to a mapping dictionary, returning a pandas.DataFrame in which the rows are the labels provided and the columns the categories to which the ids map. Each element of the matrix label-category is the sum of all ids in the relative gene list that maps to the specific category.

Parameters
  • gene_lists (iterable) – an iterable in which each element is a iterable of ids that can be mapped to mapping

  • labels (iterable) – an iterable of strings that defines the labels to be used in the resulting rows in the pandas.DataFrame; must have the same length as gene_lists

  • mapping (dict) – a dictionary in the form: gene_id->[cat1, cat2, .., catN]

  • normalise (bool) – if True the counts are normalised over the total for each row.

Returns

a pandas.DataFrame instance

mgkit.mappings.utils.group_annotation_by_mapping(annotations, mapping, attr='ko')[source]

Group annotations by mapping dictionary

Parameters
  • annotations (iterable) – iterable of gff.GFFKeg instances

  • mapping (dict) – dictionary with mappings for the attribute requested

  • attr (str) – attribute of the annotation to be used as key in mapping

Return dict

dictionary category->annotations