fasta-utils - Fasta Utilities

Overview

New in version 0.3.0.

Scripts that includes some functionality to help use FASTA files with the framework

split command

Used to split a fasta file into smaller fragments

translate command

Used to translate nucleotide sequences into amino acids.

uid command

Used to change a FASTA file headers to a unique ID. A table (tab separated) with the changes made can be kept, using the –table option.

filter

Used to filter a FASTA file by length and also for sequence/header if a pattern is contained. A list of headers to keep can be passed using the -f option.

info

Gets information about a FASTA file, prints seq_id (trimmed at first space), length and hash (default sha1) and optionally the sequence, GC content and in GFF format if wanted.

rename

Renames the headers of a FASTA file, appending a random suffix and an optional prefix

Changes

New in version 0.3.0.

Changed in version 0.3.1: added translate and uid command

Changed in version 0.3.4: ported to click

Changed in version 0.5.5: added option -1 to output only the forward/frame0 and -w to avoid wrap at 60 chars to the translate command

Changed in version 0.5.7: added filter and info commands for simple fasta file filtering and info

Options

fasta-utils

Main function

fasta-utils [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

--cite

filter

Filters a FASTA file [file-file]

fasta-utils filter [OPTIONS] [FASTA_FILE] [OUTPUT_FILE]

Options

-v, --verbose
--len-gt <len_gt>

Keeps sequences whose length is greater than

--len-lt <len_lt>

Keeps sequences whose length is less than

--header-contains <header_contains>

Keeps sequences whose header contains the string

--seq-pattern <seq_pattern>

Keeps sequences that contains the string

-f, --header-file <header_file>

Keep only sequences contained in file list

-w, --wrap

Wraps the output sequences to 60 characters

-s, --trim-tail

Removes header information after first space

Arguments

FASTA_FILE

Optional argument

OUTPUT_FILE

Optional argument

info

Gets information of FASTA file [file-file]

fasta-utils info [OPTIONS] [FASTA_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-h, --header

Prints header

-s, --include-seq

Includes the sequence

-r, --no-rename

Do not split sequence name at first space

-a, --hash-type <hash_type>
Default

sha1

Options

sha1|md5|sha256

-g, --out-gff

Outputs a GFF file

Default

False

-gc, --gc-content

Includes the GC Content

Default

False

Arguments

FASTA_FILE

Optional argument

OUTPUT_FILE

Optional argument

rename

Rename Sequence headers of FASTA file [file-file] Adds 2 possible elements to the sequence header, separated by a character 1) a suffix (random string of characters) and 2) a prefix (optional).

The character used as separator should be a ‘|’ (default), ‘#’ or other character that is not truncated in other software (space is).

In fact, this script will truncate the header at the first space

fasta-utils rename [OPTIONS] [FASTA_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-p, --prefix <prefix>

Adds a prefix to the header

-f, --file-name

Adds filename as prefix (Useful for adding the file name

-s, --separator <separator>

Separator for the elements of the new header

-l, --suffix-len <suffix_len>

Number of random characters to use

Arguments

FASTA_FILE

Optional argument

OUTPUT_FILE

Optional argument

split

Splits a FASTA file [fasta-file] in a number of fragments

fasta-utils split [OPTIONS] [FASTA_FILE]

Options

-v, --verbose
-p, --prefix <prefix>

Prefix for the file name in output

Default

split

-n, --number <number>

Number of chunks into which split the FASTA file

Default

10

-z, --gzip

gzip output files

Arguments

FASTA_FILE

Optional argument

translate

Translate FASTA file [fasta-file] in all 6 frames to [output-file]

fasta-utils translate [OPTIONS] [FASTA_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-t, --trans-table <trans_table>

translation table

Default

universal

Options

bac_plt|drs_mit|inv_mit|prt_mit|universal|vt_mit|yst_alt|yst_mit

-1, --one-seq

Only translate the sequence, instead of all 6 frames

Default

False

-w, --no-wrap

Make a sequence use only 1 line (2 including header)

Default

False

--progress

Shows Progress Bar

Arguments

FASTA_FILE

Optional argument

OUTPUT_FILE

Optional argument

uid

Changes each header of a FASTA file [file-file] to a uid (unique ID)

fasta-utils uid [OPTIONS] [FASTA_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-t, --table <table>

Filename of a table to record the changes (by default discards it)

Arguments

FASTA_FILE

Optional argument

OUTPUT_FILE

Optional argument