mgkit.workflow.count_utils module

Count Table Utilities

Map Count Table to Genes

The map command can map information from map files to create count tables from featureCounts where uid was used as attribute for the counts.

A taxonomy map can be passed if the taxonomy needs to be included in the index of the output table. The format used for the table is Parquet, which retains the Index/MultiIndex when read back with Pandas

Concatenate Parquet Files

Allows to concatenate several pandas dataframe with same indices. It’s used when the mapping file produce too big files and won’t fit in memory.

So a solution is to split the map files, making multiple parquet files and after that, concatenate them with this script.

Convert Parquet into CSV

The command to_csv outputs a CSV file from a Parquet table.

Changes

New in version 0.5.7.

mgkit.workflow.count_utils.stream_feature_counts(count_file, sample_func=None, gene_ids=None)[source]