.. _download-taxonomy: Download Taxonomy ================= A bash script called **download-taxonomy.sh** is installed along with MGKit. This script download the relevant files from NCBI using *wget*, and save the taxonomy file that can be used with MGKit to a file called **taxonomy.pickle**. Since the script uses *wget* to download the file `taxdump.tar.gz `_, if *wget* can't be found, the scripts fails. To avoid this situation, the file can be downloaded in another way, and the script detects if the file exists, avoiding the call of *wget*. The script can also save the file with another file name, if this is passed when the script is invoked. if the file extension contains *.msgpack*, the **msgpack** module is used to write the taxonomy, otherwise *pickle* is used. The advantage of *msgpack* is faster read/write and better compression ratio; it needs an additional module (`msgpack `_) that is not installed by default. Download Accession/TaxonID ========================== There are 2 separate scripts to download these tables: * `download-uniprot-taxa.sh` will download a table for Uniprot databases * `download-ncbi-taxa.sh` for BLAST DBs from NCBI, by default for *nt*, but *nr* can be downloaded with `download-ncbi-taxa.sh prot` In particular, **nr** refers to the protein database in NCBI, while **nt** refers to the nucleotidic one. Both Uniprot Swissprot and TrEMBL are downloaded by the first scripts. .. note:: Since version 0.4.4, if a PROGBAR enviroment variable is set, the progress bar (default in `wget`) is used, instead of the *dot* progress, which is more suitable for interactive use of the script