edit-gff - GFF Viewer and Editor¶
Overview¶
Script to edit GFF files
Print Attributes in a GFF file¶
By default reads the first 10 lines of a GFF file and prints all attributes present in the file sorted. Not all annotations may have the same set of attributes, so a higher number of lines may be necessary to be read.
View GFF¶
Used to print the content of a GFF file as a table (more output formats will be added later).
The attributes printed are passed with -a, one attribute at a time. For example:
edit-gff view -a uid test.gff
will print uid for all annotations. Multiple attributes can be passed, like:
edit-gff view -a uid -a taxon_id test.gff
that will print a table with uid and taxon_id of each annotation.
The default behaviour is to print only annotations that have all the attributes requested. This can be changed by using the -k options and the fields that were not found are empty strings.
An header can be printed with the -h option.
Note
the order of the fields in the table is the same as the order of the attributes passed with -a
Change or Add Attributes¶
Add or changes annotations in a GFF files with the specified attributes.
The attributes and the values are passed with the -a option, for example to set all annotations taxon_id to 2, you can pass -a taxon_id 2. Multiple attributes can be set by passing multiple options. For example:
edit-gff add -a taxon_id 2 -a taxon_db CUSTOM test.gff
will set the taxon_id to 2 and the taxon_db to CUSTOM for all annotations.
The default behaviour is to not change an attribute already set in an annotation, but this can be changed by passing the -w option. Moreover, only edited annotations can be output with -o.
To change attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.
Remove Attributes¶
Removes a list of attributes in a GFF file. Only attributes in the last column of a GFF file (fields separated by a ‘;’) can be removed. Attributes are passed with the -a option followed by one attribute. Multiple -a attribute options can be passed.
To remove attributes on a subset of the annotations, a file can be passed with the -f options, which contains one uid per line. Only annotations that match a uid in that list are edited.
Table¶
Similar to the add command and with similar functions as add-gff-info addtaxa, it allows the adding/changing of attributes from a table file.
The user defines 2 attributes in a GFF annotation, the key and the attribute. The key is used to find if an annotation is to be modified and the attribute is set for that annotation with the value in the table. For example a table:
GENE001,1.1.3.3
GENE002,1.2.3.3
If key chosen is gene_id and attribute is EC, the GFF will be scanned for annotation that have the gene_id equal to GENE001 and set the attribute EC to 1.1.3.3 and similarly for the second row.
The table can have multiple fields, but only 2 can be loaded, the key and attribute in the options. The 2 fields are loaded into a python dictionary, with the key and attribute being respectively the key and value in it. So 2 things must be noted:
duplicates keys will be overwritten (only last one remains)
the entire fields are first loaded, which can take up a lot of RAM
The default is for the key to be the first field (0) and the attribute is the second (1). The table may contains some headers, so the first N rows can be skipped with -r. Also, the field separator can be chosen, as well as only the edited annotation be printed (-o option).
If there are comments in the file, for example lines starting with ‘#’, it is possible to specify the option -c ‘#’ to skip those lines and avoid errors.
Rename¶
The command rename allows to change attribute names, by passing the attributes:
$ edit-gff rename -a taxo_ID taxon_id input.gff output.gff
Will rename all instances of the attribute taxo_ID to taxon_id. Between the old and new attribute names, a space must be put.
By default, the command won’t stop execution if an attribute is not found, it will just silently continue. Using -s will force the script to stop if one of the attributes passed is not found.
Changes¶
New in version 0.4.4.
Changed in version 0.5.5: added -c option to table command
Changed in version 0.5.7: added rename command and added options to table
Options¶
edit-gff¶
Main function
edit-gff [OPTIONS] COMMAND [ARGS]...
Options
-
--version
¶
Show the version and exit.
-
--cite
¶
add¶
Add fields to a GFF File
edit-gff add [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]
Options
-
-v
,
--verbose
¶
-
-a
,
--attributes
<attributes>
¶ Required Add attributes to the GFF file. For example -a taxon_id 2 will add taxon_id attribute with a value of 2 to all annotations. Multiple attributes can be set, for example: -a taxon_id 2 -a gene_id TEST
-
-w
,
--overwrite
¶
Overwrite the attributes if present
-
-o
,
--only-edited
¶
Only output edited annotations
-
-f
,
--uids
<uids>
¶ Only edit annotations with uid in a file (one per line)
Arguments
-
INPUT_FILE
¶
Optional argument
-
OUTPUT_FILE
¶
Optional argument
fields¶
Prints the fields in a GFF File
edit-gff fields [OPTIONS] [GFF_FILE] [TXT_FILE]
Options
-
-v
,
--verbose
¶
-
-n
,
--num-ann
<num_ann>
¶ Number of annotations to parse, 0 will parse the whole file
- Default
10
Arguments
-
GFF_FILE
¶
Optional argument
-
TXT_FILE
¶
Optional argument
remove¶
Remove fields from a GFF File
edit-gff remove [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]
Options
-
-v
,
--verbose
¶
-
-a
,
--attributes
<attributes>
¶ Required Remove attributes to the GFF file. For example -a taxon_id will remove taxon_id attribute for all annotations. Multiple attributes can be removed, for example: -a taxon_id -a gene_id
-
-f
,
--uids
<uids>
¶ Only edit annotations with uid in a file (one per line)
Arguments
-
INPUT_FILE
¶
Optional argument
-
OUTPUT_FILE
¶
Optional argument
rename¶
Rename Attributes in GFF files
edit-gff rename [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]
Options
-
-v
,
--verbose
¶
-
-s
,
--strict
¶
If the attribute is not found, stop running
-
-a
,
--attributes
<attributes>
¶ Required Attributes to rename. For example -a taxon_id taxonID will change taxon_id attributes to taxonID. Multiple attributes can be set, for example: -a taxon_id taxonID -a gene_id GeneID
Arguments
-
INPUT_FILE
¶
Optional argument
-
OUTPUT_FILE
¶
Optional argument
table¶
Adds fields from a Table file
edit-gff table [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]
Options
-
-v
,
--verbose
¶
-
-k
,
--key
<key>
¶ Attribute used to search the table defaults to uid
-
-a
,
--attribute
<attribute>
¶ Required Attribute to add/change
-
-o
,
--only-edited
¶
Only output edited annotations
-
-r
,
--skip-rows
<skip_rows>
¶ Number of rows to skip at the start of the file
-
-s
,
--separator
<separator>
¶ Fields separator, default to TAB
-
-c
,
--comment
<comment>
¶ Characters for comments in file (eg ‘#’). defaults to None
-
-t
,
--table-file
<table_file>
¶ Required
-
-p
,
--prodigal-gene
¶
The table is for a file that has been produced by prodigal and assumes that the key is of the form: `seq_id`_N. For example by running eggNOG mapper on AA files generated by prodigal and integrate back the results into the original GFF file
-
--strip-kegg
¶
Strips prefixes from Kegg IDs
-
-ki
,
--key-index
<key_index>
¶ Which field in the table is the key value
- Default
0
-
-ai
,
--attr-index
<attr_index>
¶ Which field in the table is the attribute value
- Default
1
-
-d
,
--default-value
<default_value>
¶ if the key is not found, use this value
Arguments
-
INPUT_FILE
¶
Optional argument
-
OUTPUT_FILE
¶
Optional argument
view¶
View GFF file as table/json, etc.
edit-gff view [OPTIONS] [INPUT_FILE] [OUTPUT_FILE]
Options
-
-v
,
--verbose
¶
-
-h
,
--header
¶
Print Header
-
-k
,
--keep-empty
¶
Keep annotations where not all attributes were found
-
-a
,
--attributes
<attributes>
¶ Required Attributes of GFF file to print. For example -a taxon_id will print taxon_id for all annotations. Multiple attributes can be printed, for example: -a taxon_id -a gene_id
-
-s
,
--separator
<separator>
¶ Fields separator, default to TAB
Arguments
-
INPUT_FILE
¶
Optional argument
-
OUTPUT_FILE
¶
Optional argument