snp_parser - SNPs analysis

Overview

blockdiag Alignments Assembly VCF files VCF Merge SNPs Calling Add Information snp_parser GFF

The workflow starts with a number of alignments passed to the SNP calling software, which produces one VCF file per alignment/sample. These VCF files are used by SNPDat along a GTF file and the reference genome to integrate the information in VCF files with synonymous/non-synonymous information.

All VCF files are merged into a VCF that includes information about all the SNPs called among all samples. This merged VCF is passed, along with the results from SNPDat and the GFF file to snp_parser.py which integrates information from all data sources and output files in a format that can be later used by the rest of the pipeline. 1

Note

The GFF file passed to the parser must have per sample coverage information.

1

This step is done separately because it’s both time consuming and can helps to paralellise later steps

Script Reference

Deprecated since version 0.5.7: This script is deprecated now, use pnps-gen vcf instead

Note

if you need to use the script, install HTSeq

This script parses results of SNPs analysis from any tool for SNP calling 2 and integrates them into a format that can be later used for other scripts in the pipeline.

It integrates coverage and expected number of syn/nonsyn change and taxonomy from a GFF file, SNP data from a VCF file.

Note

The script accept gzipped VCF files

2

GATK pipeline was tested, but it is possible to use samtools and bcftools

Changes

Changed in version 0.2.1: added -s option for VCF files generated using bcftools

Changed in version 0.1.16: reworkked internals and removed SNPDat, syn/nonsyn evaluation is internal

Changed in version 0.1.13: reworked the internals and the classes used, including options -m and -s

Options

DEPRECATED, use pnps-gen vcf SNPs analysis, requires a vcf file

usage: snp_parser [-h] [-o OUTPUT_FILE] [-q MIN_QUAL] [-f MIN_FREQ] [-r MIN_READS] -g GFF_FILE -p VCF_FILE -a REFERENCE -m SAMPLES_ID [-c COV_SUFF] [-s]
                  [-v | --quiet] [--cite] [--manual] [--version]

Named Arguments

-o, --output-file

Ouput file

Default: snp_data.pickle

-q, --min-qual

Minimum SNP quality (Phred score)

Default: 30

-f, --min-freq

Minimum allele frequency

Default: 0.01

-r, --min-reads

Minimum number of reads to accept the SNP

Default: 4

-g, --gff-file

GFF file with annotations

-p, --vcf-file

Merged VCF file

-a, --reference

Fasta file with the GFF Reference

-m, --samples-id

the ids of the samples used in the analysis

-c, --cov-suff

Per sample coverage suffix in the GFF

Default: “_cov”

-s, --bcftools-vcf

bcftools call was used to produce the VCF file

Default: False

-v, --verbose

more verbose - includes debug messages

Default: 20

--quiet

less verbose - only error and critical messages

--cite

Show citation for the framework

--manual

Show the script manual

--version

show program’s version number and exit