mgkit.snps.classes module

Manage SNP data.

class mgkit.snps.classes.GeneSNP(gene_id='', taxon_id=0, exp_syn=0, exp_nonsyn=0, coverage=None, snps=None, uid=None, json_data=None)[source]

Bases: mgkit.snps.classes.RatioMixIn

New in version 0.1.13.

Class defining gene and synonymous/non-synonymous SNPs.

It defines background synonymous/non-synonymous attributes and only has a method right now, which calculate pN/pS ratio. The method is added through a mixin object, so the ratio can be customised and be shared with the old implementation.

uid

unique id for the isoform (to be referenced in a GFF file)

Type

str

gene_id

gene id

Type

str

taxon_id

gene taxon

Type

int

exp_syn

expected synonymous changes

Type

int

exp_nonsyn

expected non-synonymous changes

Type

int

coverage

gene coverage

Type

int

snps

list of SNPs associated with the gene, each element is a tuple with the position (relative to the gene start), the second is the nucleotidic change and the third is the aa SNP type as defined by SNPType.

Type

list

Note

The main difference with the GeneSyn is that all snps are kept and syn and nonsyn are not attributes but properties that return the count of synonymous and non-synonymous SNPs in the snps list.

Warning

This class uses more memory than GeneSyn because it doesn’t use __slots__, it may be changed in later versions.

add(other)[source]

Inplace addition of another instance values. No check for them being the same gene/taxon, it’s up to the user to check that they can be added together.

Parameters

other – instance of GeneSyn to add

add_snp(position, change, snp_type=<SNPType.unknown: 0>)[source]

Adds a SNP to the list

Parameters
  • position (int) – SNP position, relative to the gene start

  • change (str) – nucleotidic change

  • snp_type (enum) – one of the values defined in SNPType

coverage = None
exp_nonsyn = None
exp_syn = None
from_json(data)[source]

Instantiate the instance with values from a json definition

Parameters

data (str) – json representation, as returned by GeneSNP.to_json()

gene_id = None
property nonsyn

Returns the expected non-synonymous changes

snps = None
property syn

Returns the expected synonymous changes

taxon_id = None
to_json()[source]

Returns a json definition of the instance

Returns

json representation of the instance

Return type

str

uid = None
class mgkit.snps.classes.RatioMixIn[source]

Bases: object

calc_pn()[source]

Method that returns only the pN part of the pN/pS ratio.

Returns

the pN value, unless self.nonsyn is 0, in which case numpy.nan is returned

Return type

float

calc_ps()[source]

Method that returns only the pS part of the pN/pS ratio.

Returns

the pS value, unless self.syn is 0, in which case numpy.nan is returned

Return type

float

calc_ratio(haplotypes=False)[source]

Changed in version 0.2.2: split the function to handle flag_value in another method

Calculate \(\frac {pN}{pS}\) for the gene.

(1)\[\frac {pN}{pS} = \frac{ ^{oN}/_{eN}}{ ^{oS}/_{eS}}\]

Where:

  • oN (number of non-synonymous - nonsyn)

  • eN (expected number of non-synonymous - exp_nonsyn)

  • oS (number of synonymous - syn)

  • eS (expected number of synonymous - exp_syn)

Parameters
  • flag_value (bool) – when there’s no way to calculate the ratio, the possible cases will be flagged with a negative number. This allows to make substitutions for these values

  • haplotypes (bool) – if true, coverage information is not used, because the SNPs are assumed to come from an alignment that has sequences having haplotypes

Returns

the \(\frac {pN}{pS}\) for the gene.

Note

Because pN or pS can be 0, and the return value would be NaN, we take in account some special cases. The default return value in this cases is numpy.nan.

  • Both synonymous and non-synonymous values are 0:

    • if both the syn and nonsyn attributes are 0 but there’s coverage for this gene, we return a 0, as there’s no evolution in this gene. Before, the coverage was checked by this method against either the passed min_cov parameter that was equal to MIN_COV. Now the case is for the user to check the coverage and functions in mgkit.snps.conv_func do that. If enough coverage was achieved, the haplotypes parameter can be used to return a 0

All other cases return a NaN value

Return type

float

calc_ratio_flag()[source]

New in version 0.2.2.

Handles cases where it’s important to flag the returned value, as explained in GeneSNP.calc_ratio(), and when the both the number of synonymous and non-synonymous is greater than 0, the pN/pS value is returned.

  • The number of non-synonymous is greater than 0 but the number of

    synonymous is 0:

    • if flag_value is True, the returned value is -1

    • The number of synonymous is greater than 0 but the number of non-synonymous is 0:

      • if flag_value is True, the returned value is -2

\(oS\)

\(oN\)

return value

>0

>0

pN/pS

0

0

-3

>0

0

-1

0

>0

-2

class mgkit.snps.classes.SNPType(value)[source]

Bases: enum.Enum

New in version 0.1.13.

Enum that defines SNP types. Supported at the moment:

  • unknown = 0

  • syn (synonymous) = 1

  • nonsyn (non-synonymous) = 2

Note

No support is planned at the moment to support indel mutations

nonsyn = 2
syn = 1
unknown = 0