snp_parser - SNPs analysis¶
Overview¶
The workflow starts with a number of alignments passed to the SNP calling software, which produces one VCF file per alignment/sample. These VCF files are used by SNPDat along a GTF file and the reference genome to integrate the information in VCF files with synonymous/non-synonymous information.
All VCF files are merged into a VCF that includes information about all the SNPs called among all samples. This merged VCF is passed, along with the results from SNPDat and the GFF file to snp_parser.py which integrates information from all data sources and output files in a format that can be later used by the rest of the pipeline. 1
Note
The GFF file passed to the parser must have per sample coverage information.
- 1
This step is done separately because it’s both time consuming and can helps to paralellise later steps
Script Reference¶
Deprecated since version 0.5.7: This script is deprecated now, use pnps-gen vcf instead
Note
if you need to use the script, install HTSeq
This script parses results of SNPs analysis from any tool for SNP calling 2 and integrates them into a format that can be later used for other scripts in the pipeline.
It integrates coverage and expected number of syn/nonsyn change and taxonomy from a GFF file, SNP data from a VCF file.
Note
The script accept gzipped VCF files
- 2
GATK pipeline was tested, but it is possible to use samtools and bcftools
Changes¶
Changed in version 0.2.1: added -s option for VCF files generated using bcftools
Changed in version 0.1.16: reworkked internals and removed SNPDat, syn/nonsyn evaluation is internal
Changed in version 0.1.13: reworked the internals and the classes used, including options -m and -s
Options¶
DEPRECATED, use pnps-gen vcf SNPs analysis, requires a vcf file
usage: snp_parser [-h] [-o OUTPUT_FILE] [-q MIN_QUAL] [-f MIN_FREQ] [-r MIN_READS] -g GFF_FILE -p VCF_FILE -a REFERENCE -m SAMPLES_ID [-c COV_SUFF] [-s]
[-v | --quiet] [--cite] [--manual] [--version]
Named Arguments¶
- -o, --output-file
Ouput file
Default: snp_data.pickle
- -q, --min-qual
Minimum SNP quality (Phred score)
Default: 30
- -f, --min-freq
Minimum allele frequency
Default: 0.01
- -r, --min-reads
Minimum number of reads to accept the SNP
Default: 4
- -g, --gff-file
GFF file with annotations
- -p, --vcf-file
Merged VCF file
- -a, --reference
Fasta file with the GFF Reference
- -m, --samples-id
the ids of the samples used in the analysis
- -c, --cov-suff
Per sample coverage suffix in the GFF
Default: “_cov”
- -s, --bcftools-vcf
bcftools call was used to produce the VCF file
Default: False
- -v, --verbose
more verbose - includes debug messages
Default: 20
- --quiet
less verbose - only error and critical messages
- --cite
Show citation for the framework
- --manual
Show the script manual
- --version
show program’s version number and exit