mgkit.mappings.pandas_map module

Module that contains mapping operations on pandas data structures

mgkit.mappings.pandas_map.calc_coefficient_of_variation(dataframe)[source]

Calculate coefficient of variation for a DataFrame. Uses formula from Wikipedia

The formula used is \(\left (1 + \frac {1}{4n} \right ) * c_{v}\) where \(c_{v} = \frac {s}{\bar{x}}\)

mgkit.mappings.pandas_map.concatenate_and_rename_tables(dataframes, roots)[source]

Concatenates a list of pandas.DataFrame instances and renames the columns prepending a string to each column in each table from a list of prefixes.

Parameters
  • dataframes (iterable) – iterable of DataFrame instances

  • roots (iterable) – list of prefixes to append to the column names of each DataFrame

Return DataFrame

returns a DataFrame instance

Todo

  • move to pandas_utils?

mgkit.mappings.pandas_map.group_dataframe_by_mapping(dataframe, mapping, root_taxon, name_dict=None)[source]

Return a pandas.DataFrame filtered by mapping and root taxon, the values for each column is averaged over all genes mapping to a category.

Parameters
  • dataframe (DataFrame) – DataFrame with multindex gene-root

  • mapping (dict) – dictionary of category->genes

  • root_taxon (str) – root taxon to group genes

  • name_dict (dict) – dictionary of category->name

Return DataFrame

DataFrame filtered

mgkit.mappings.pandas_map.make_stat_table(dataframes, roots)[source]

Produces a pandas.DataFrame that summarise the supplied DataFrames. The stats include mean, stdev and coefficient of variation for each root taxon.

Parameters
  • dataframes (iterable) – iterable of DataFrame instances

  • roots (iterable) – list of root taxa to which each table belongs

Return DataFrame

returns a DataFrame instance