mgkit.workflow.snp_parser module

Deprecated since version 0.5.7: This script is deprecated now, use pnps-gen vcf instead

Note

if you need to use the script, install HTSeq

This script parses results of SNPs analysis from any tool for SNP calling 1 and integrates them into a format that can be later used for other scripts in the pipeline.

It integrates coverage and expected number of syn/nonsyn change and taxonomy from a GFF file, SNP data from a VCF file.

Note

The script accept gzipped VCF files

1

GATK pipeline was tested, but it is possible to use samtools and bcftools

Changes

Changed in version 0.2.1: added -s option for VCF files generated using bcftools

Changed in version 0.1.16: reworkked internals and removed SNPDat, syn/nonsyn evaluation is internal

Changed in version 0.1.13: reworked the internals and the classes used, including options -m and -s

mgkit.workflow.snp_parser.check_snp_in_set(samples, snp_data, pos, change, annotations, seq)[source]

Used by parse_vcf() to check if a SNP

Parameters
  • samples (iterable) – list of samples that contain the SNP

  • snp_data (dict) – dictionary from init_count_set() with per sample SNPs information

mgkit.workflow.snp_parser.init_count_set(annotations)[source]
mgkit.workflow.snp_parser.main()[source]

Main function

mgkit.workflow.snp_parser.parse_vcf(vcf_file, snp_data, min_reads, min_af, min_qual, annotations, seqs, options, line_num=100000)[source]

Parse VCF file counts synonymous and non-synonymous SNPs

Parameters
  • vcf_file (file) – file handle to a VCF file

  • snp_data (dict) – dictionary from init_count_set() with per sample SNPs information

  • min_reads (int) – minimum number of reads to accept a SNP

  • min_af (float) – minimum allele frequency to accept a SNP

  • min_qual (int) – minimum quality (Phred score) to accept a SNP

  • annotations (dict) – annotations grouped by their reference sequence

  • seqs (dict) – reference sequences

  • line_num (int) – the interval in number of lines at which progress will be printed

mgkit.workflow.snp_parser.save_data(output_file, snp_data)[source]

Pickle data structures to the disk.

Parameters
  • output_file (str) – base name for pickle files

  • snp_data (dict) – dictionary from init_count_set() with per sample SNPs information

mgkit.workflow.snp_parser.set_parser()[source]

Sets command line arguments parser