edit-gff - GFF Viewer and Editor

Overview

Script to edit GFF files

View GFF

Used to print the content of a GFF file as a table (more output formats will be added later).

The attributes printed are passed with -a, one attribute at a time. For example:

edit-gff view -a uid test.gff

will print uid for all annotations. Multiple attributes can be passed, like:

edit-gff view -a uid -a taxon_id test.gff

that will print a table with uid and taxon_id of each annotation.

The default behaviour is to print only annotations that have all the attributes requested. This can be changed by using the -k options and the fields that were not found are empty strings.

An header can be printed with the -h option.

Note

the order of the fields in the table is the same as the order of the attributes passed with -a

Change or Add Attributes

Add or changes annotations in a GFF files with the specified attributes.

The attributes and the values are passed with the -a option, for example to set all annotations taxon_id to 2, you can pass -a taxon_id 2. Multiple attributes can be set by passing multiple options. For example:

edit-gff add -a taxon_id 2 -a taxon_db CUSTOM test.gff

will set the taxon_id to 2 and the taxon_db to CUSTOM for all annotations.

The default behaviour is to not change an attribute already set in an annotation, but this can be changed by passing the -w option. Moreover, only edited annotations can be output with -o.

To change attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.

Remove Attributes

Removes a list of attributes in a GFF file. Only attributes in the last column of a GFF file (fields separated by a ‘;’) can be removed. Attributes are passed with the -a option followed by one attribute. Multiple -a attribute options can be passed.

To remove attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.

Table

Similar to the add command and with similar functions as add-gff-info addtaxa, it allows the adding/changing of attributes from a table file.

The user defines 2 attributes in a GFF annotation, the key and the attribute. The key is used to find if an annotation is to be modified and the attribute is set for that annotation with the value in the table. For example a table:

GENE001,1.1.3.3
GENE002,1.2.3.3

If key chosen is gene_id and attribute is EC, the GFF will be scanned for annotation that have the gene_id equal to GENE001 and set the attribute EC to 1.1.3.3 and similarly for the second row.

The table can have multiple fields, but only 2 can be loaded, the key and attribute in the options. The 2 fields are loaded into a python dictionary, with the key and attribute being respectively the key and value in it. So 2 things must be noted:

  1. duplicates keys will be overwritten (only last one remains)

  2. the entire fields are first loaded, which can take up a lot of RAM

The default is for the key to be the first field (0) and the attribute is the second (1). The table may contains some headers, so the first N rows can be skipped with -r. Also, the field separator can be chosen, as well as only the edited annotation be printed (-o option).

If there are comments in the file, for example lines starting with ‘#’, it is possible to specify the option -c ‘#’ to skip those lines and avoid errors.

Rename

The command rename allows to change attribute names, by passing the attributes:

$ edit-gff rename -a taxo_ID taxon_id input.gff output.gff

Will rename all instances of the attribute taxo_ID to taxon_id. Between the old and new attribute names, a space must be put.

By default, the command won’t stop execution if an attribute is not found, it will just silently continue. Using -s will force the script to stop if one of the attributes passed is not found.

Changes

New in version 0.4.4.

Changed in version 0.5.5: added -c option to table command

Changed in version 0.5.7: added rename command and added options to table

Options

edit-gff

Main function

edit-gff [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

--cite

add

Add fields to a GFF File

edit-gff add [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-a, --attributes <attributes>

Required Add attributes to the GFF file. For example -a taxon_id 2 will add taxon_id attribute with a value of 2 to all annotations. Multiple attributes can be set, for example: -a taxon_id 2 -a gene_id TEST

-w, --overwrite

Overwrite the attributes if present

-o, --only-edited

Only output edited annotations

-f, --uids <uids>

Only edit annotations with uid in a file (one per line)

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument

fields

Prints the fields in a GFF File

edit-gff fields [OPTIONS] [GFF_FILE] [TXT_FILE]

Options

-v, --verbose
-n, --num-ann <num_ann>

Number of annotations to parse, 0 will parse the whole file

Default

10

Arguments

GFF_FILE

Optional argument

TXT_FILE

Optional argument

remove

Remove fields from a GFF File

edit-gff remove [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-a, --attributes <attributes>

Required Remove attributes to the GFF file. For example -a taxon_id will remove taxon_id attribute for all annotations. Multiple attributes can be removed, for example: -a taxon_id -a gene_id

-f, --uids <uids>

Only edit annotations with uid in a file (one per line)

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument

rename

Rename Attributes in GFF files

edit-gff rename [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-s, --strict

If the attribute is not found, stop running

-a, --attributes <attributes>

Required Attributes to rename. For example -a taxon_id taxonID will change taxon_id attributes to taxonID. Multiple attributes can be set, for example: -a taxon_id taxonID -a gene_id GeneID

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument

table

Adds fields from a Table file

edit-gff table [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-k, --key <key>

Attribute used to search the table defaults to uid

-a, --attribute <attribute>

Required Attribute to add/change

-o, --only-edited

Only output edited annotations

-r, --skip-rows <skip_rows>

Number of rows to skip at the start of the file

-s, --separator <separator>

Fields separator, default to TAB

-c, --comment <comment>

Characters for comments in file (eg ‘#’). defaults to None

-t, --table-file <table_file>

Required

-p, --prodigal-gene

The table is for a file that has been produced by prodigal and assumes that the key is of the form: `seq_id`_N. For example by running eggNOG mapper on AA files generated by prodigal and integrate back the results into the original GFF file

--strip-kegg

Strips prefixes from Kegg IDs

-ki, --key-index <key_index>

Which field in the table is the key value

Default

0

-ai, --attr-index <attr_index>

Which field in the table is the attribute value

Default

1

-d, --default-value <default_value>

if the key is not found, use this value

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument

view

View GFF file as table/json, etc.

edit-gff view [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]

Options

-v, --verbose
-h, --header

Print Header

-k, --keep-empty

Keep annotations where not all attributes were found

-a, --attributes <attributes>

Required Attributes of GFF file to print. For example -a taxon_id will print taxon_id for all annotations. Multiple attributes can be printed, for example: -a taxon_id -a gene_id

-s, --separator <separator>

Fields separator, default to TAB

Arguments

INPUT_FILE

Optional argument

OUTPUT_FILE

Optional argument