mgkit.net.uniprot module¶
Contains function and constants for Uniprot access
-
mgkit.net.uniprot.
UNIPROT_GET
= 'http://www.uniprot.org/uniprot/'¶ URL to Uniprot REST API
-
mgkit.net.uniprot.
UNIPROT_MAP
= 'http://www.uniprot.org/mapping/'¶ URL to Uniprot mapping REST API
-
mgkit.net.uniprot.
UNIPROT_TAXONOMY
= 'http://www.uniprot.org/taxonomy/'¶ URL to Uniprot REST API - Taxonomy
-
mgkit.net.uniprot.
get_gene_info
(gene_ids, columns, max_req=50, contact=None)[source]¶ New in version 0.1.12.
Get informations about a list of genes. it uses
query_uniprot()
to send the request and format the response in a dictionary.- Parameters
- Returns
dictionary where the keys are the gene_ids requested and the values are dictionaries with the names of the columns requested as keys and the corresponding values, which can be lists if the values are are semicolon separated strings.
- Return type
Example
To get the taxonomy ids for some genes:
>>> uniprot.get_gene_info(['Q09575', 'Q8DQI6'], ['organism-id']) {'Q09575': {'organism-id': '6239'}, 'Q8DQI6': {'organism-id': '171101'}}
-
mgkit.net.uniprot.
get_gene_info_iter
(gene_ids, columns, contact=None, max_req=50)[source]¶ New in version 0.3.3.
Alternative function to
get_gene_info()
, returning an iterator to avoid connections timeouts when updating a dictionaryThis funciton’s parameters are the same as
get_gene_info()
-
mgkit.net.uniprot.
get_ko_to_eggnog_mappings
(ko_ids, contact=None)[source]¶ New in version 0.1.14.
It’s not possible to map in one go KO IDs to eggNOG IDs via the API in Uniprot. This function uses
query_uniprot()
to get all Uniprot IDs requested and the return a dictionary with all their eggNOG IDs they map to.
-
mgkit.net.uniprot.
get_mappings
(entry_ids, db_from='ID', db_to='EMBL', out_format='tab', contact=None)[source]¶ Gets mapping of genes using Uniprot REST API. The db_from and db_to values are the ones accepted by Uniprot API. The same applies to out_format, the only processed formats are ‘list’, which returns a list of the mappings (should be used with one gene only) and ‘tab’, which returns a dictionary with the mapping. All other values returns a string with the newline stripped.
- Parameters
entry_ids (iterable) – iterable of ids to be mapped (there’s a limit) to the maximum length of a HTTP request, so it should be less than 50
db_from (str) – string that identify the DB for elements in entry_ids
db_to (str) – string that identify the DB to which map entry_ids
out_format (str) – format of the mapping; ‘list’ and ‘tab’ are processed
contact (str) – email address to be passed in the query (requested Uniprot API)
- Returns
tuple, dict or str depending on out_format value
-
mgkit.net.uniprot.
get_sequences_by_ko
(ko_id, taxonomy, contact=None, reviewed=True)[source]¶ Gets sequences from Uniprot, restricting to the taxon id passed.
-
mgkit.net.uniprot.
get_uniprot_ec_mappings
(gene_ids, contact=None)[source]¶ New in version 0.1.14.
Shortcut to download EC mapping of Uniprot IDs. Uses
get_gene_info()
passing the correct column (ec).
-
mgkit.net.uniprot.
ko_to_mapping
(ko_id, query, columns, contact=None)[source]¶ Returns the mappings to the supplied KO. Can be used for any id, the query format is free as well as the columns returned. The only restriction is using a tab format, that is parsed.
- Parameters
Note
each mapping in the column is separated by a ;
-
mgkit.net.uniprot.
parse_uniprot_response
(data, simple=True)[source]¶ New in version 0.1.12.
Parses raw response from a Uniprot query (tab format only) from functions like
query_uniprot()
into a dictionary. It requires that the first column is the entry id (or any other unique id).- Parameters
- Returns
The format of the resulting dictionary is entry_id -> {column1 -> value, column2 -> value, ..} unless there’s only one column and simple is True, in which case the value is equal to the value of the only column.
- Return type
-
mgkit.net.uniprot.
query_uniprot
(query, columns=None, format='tab', limit=None, contact=None, baseurl='http://www.uniprot.org/uniprot/')[source]¶ New in version 0.1.12.
Changed in version 0.1.13: added baseurl and made columns a default argument
Queries Uniprot, returning the raw response in tbe format specified. More informations at the page
- Parameters
query (str) – query to submit, as put in the input box
columns (None, iterable) – list of columns to return
format (str) – response format
limit (int, None) – number of entries to return or None to request all entries
contact (str) – email address to be passed in the query (requested Uniprot API)
baseurl (str) – base url for the REST API, can be either
UNIPROT_GET
orUNIPROT_TAXONOMY
- Returns
raw response from the query
- Return type
Example
To get the taxonomy ids for some genes:
>>> uniprot.query_uniprot('Q09575 OR Q8DQI6', ['id', 'organism-id']) 'Entry\tOrganism ID\nQ8DQI6\t171101\nQ09575\t6239\n'
Warning
because of limits in the length of URLs, it’s advised to limit the length of the query string.