mgkit.workflow.extract_gff_info module¶
Extract information from GFF files
sequence command¶
Used to extract the nucleotidic sequences from GFF annotations. It requires the fasta file containing the sequences referenced in the GFF seq_id attribute (first column of the raw GFF).
The sequnces extract have as identifier the uid stored in the GFF file and by default the sequnece is not reverse complemented if the annotation is on the - strand, but this can be changed by using the -r option.
The sequences are wrapped at 60 characters, as per FASTA specs, but this behavior can be disabled by specifing the -w option.
Warning
The reference file is loaded in memory
dbm command¶
Creates a dbm DB using the semidbm package. The database can then be loaded
using mgkit.db.dbm.GFFDB
mongodb command¶
Outputs annotations in a format supported by MongoDB. More information about it
can be found in mgkit.db.mongo
gtf command¶
Outputs annotations in the GTF format
split command¶
Splits a GFF file into smaller chunks, ensuring that all of a sequence annotations are in the same file.
cov command¶
Calculate annotation coverage for each contig in a GFF file. The command can be run as strand specific (not by default) and requires the reference file to which the annotation refer to. The output file is a tab separated one, with the first column being the sequence name, the second is the strand (+, -, or NA if not strand specific) and the third is the percentage of the sequence covered by annotations.
Warning
The GFF file is assumed to be sorted, by sequence or sequence-strand if wanted. The GFF file can be sorted using sort -s -k 1,1 -k 7,7 for strand specific, or sort -s -k 1,1 if not strand specific.
Changes¶
Changed in version 0.3.4: using click instead of argparse, renamed split command –json to –json-out
Changed in version 0.3.1: added cov command
Changed in version 0.3.0: added –split option to sequence command
Changed in version 0.2.6: added split command, –indent option to mongodb
Changed in version 0.2.3: added –gene-id option to gtf command
New in version 0.2.2: added gtf command
New in version 0.2.1: dbm and mongodb commands
New in version 0.1.15.