mgkit.workflow.edit_gff module

Script to edit GFF files

View GFF

Used to print the content of a GFF file as a table (more output formats will be added later).

The attributes printed are passed with -a, one attribute at a time. For example:

edit-gff view -a uid test.gff

will print uid for all annotations. Multiple attributes can be passed, like:

edit-gff view -a uid -a taxon_id test.gff

that will print a table with uid and taxon_id of each annotation.

The default behaviour is to print only annotations that have all the attributes requested. This can be changed by using the -k options and the fields that were not found are empty strings.

An header can be printed with the -h option.

Note

the order of the fields in the table is the same as the order of the attributes passed with -a

Change or Add Attributes

Add or changes annotations in a GFF files with the specified attributes.

The attributes and the values are passed with the -a option, for example to set all annotations taxon_id to 2, you can pass -a taxon_id 2. Multiple attributes can be set by passing multiple options. For example:

edit-gff add -a taxon_id 2 -a taxon_db CUSTOM test.gff

will set the taxon_id to 2 and the taxon_db to CUSTOM for all annotations.

The default behaviour is to not change an attribute already set in an annotation, but this can be changed by passing the -w option. Moreover, only edited annotations can be output with -o.

To change attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.

Remove Attributes

Removes a list of attributes in a GFF file. Only attributes in the last column of a GFF file (fields separated by a ‘;’) can be removed. Attributes are passed with the -a option followed by one attribute. Multiple -a attribute options can be passed.

To remove attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.

Table

Similar to the add command and with similar functions as add-gff-info addtaxa, it allows the adding/changing of attributes from a table file.

The user defines 2 attributes in a GFF annotation, the key and the attribute. The key is used to find if an annotation is to be modified and the attribute is set for that annotation with the value in the table. For example a table:

GENE001,1.1.3.3
GENE002,1.2.3.3

If key chosen is gene_id and attribute is EC, the GFF will be scanned for annotation that have the gene_id equal to GENE001 and set the attribute EC to 1.1.3.3 and similarly for the second row.

The table can have multiple fields, but only 2 can be loaded, the key and attribute in the options. The 2 fields are loaded into a python dictionary, with the key and attribute being respectively the key and value in it. So 2 things must be noted:

  1. duplicates keys will be overwritten (only last one remains)

  2. the entire fields are first loaded, which can take up a lot of RAM

The default is for the key to be the first field (0) and the attribute is the second (1). The table may contains some headers, so the first N rows can be skipped with -r. Also, the field separator can be chosen, as well as only the edited annotation be printed (-o option).

If there are comments in the file, for example lines starting with ‘#’, it is possible to specify the option -c ‘#’ to skip those lines and avoid errors.

Rename

The command rename allows to change attribute names, by passing the attributes:

$ edit-gff rename -a taxo_ID taxon_id input.gff output.gff

Will rename all instances of the attribute taxo_ID to taxon_id. Between the old and new attribute names, a space must be put.

By default, the command won’t stop execution if an attribute is not found, it will just silently continue. Using -s will force the script to stop if one of the attributes passed is not found.

Changes

New in version 0.4.4.

Changed in version 0.5.5: added -c option to table command

Changed in version 0.5.7: added rename command and added options to table