mgkit.io.uniprot module

New in version 0.1.13.

Uniprot file formats

mgkit.io.uniprot.MAPPINGS = {'biocyc': 'BioCyc', 'eggnog': 'eggNOG', 'embl': 'EMBL', 'embl_cds': 'EMBL-CDS', 'gi': 'GI', 'kegg': 'KEGG', 'ko': 'KO', 'string': 'STRING', 'taxonomy': 'NCBI_TaxID', 'unipathway': 'UniPathway'}

Some of the mappings contained in the idmapping.dat.gz

mgkit.io.uniprot.parse_uniprot_mappings(file_handle, gene_ids=None, mappings=None, num_lines=10000000)[source]

Parses a Uniprot mapping file, returning a generator with the mappings.

Parameters
  • file_handle (str, file) – file name or open file handle

  • gene_ids (None, set) – if not None, the returned mappings are for the gene IDs specified

  • mappings (None, set) – mappings to be returned

  • num_lines (None, int) – number of which a message is logged. If None, no message is logged

Yields

tuple – the first element is the gene ID, the second is the mapping type and third element is the mapped ID

mgkit.io.uniprot.uniprot_mappings_to_dict(file_handle, gene_ids, mappings, num_lines=None)[source]

Changed in version 0.3.4: added num_lines

Parses a Uniprot mapping file, returning a generator of dictionaries with the mappings requested.

Parameters
  • file_handle (str, file) – file name or open file handle

  • gene_ids (None, set) – if not None, the returned mappings are for the gene IDs specified

  • mappings (None, set) – mappings to be returned

  • num_lines (int, None) – passed to parse_uniprot_mappings()

Yields

tuple – the first element is the gene ID, the second is a dictionary with all the mappings found, the key is the mapping type and the value is a list of all mapped IDs