Tech Specs

The main functions of SNPAlyze will be introduced.

Genotyping data import

SNPAlyze is able to import the following specific types of Genotyping data files and analyze them.

Available file types

  • Microsoft Excel files (xls/xlsx format)
  • TSV (Tab Separated Values)/CSV (Comma Separated Values) file
  • SNPAlyze Data file (slyz format)
  • Biotage PSQ96 export file
  • ABI PRISM7900 export file

When case-control studies are analyzed, in addition to the columns with genotyping data, extra columns for cases and controls are necessary in order to distinguish the groups.

It is not necessary to provide any extra columns to distinguish groups when Linkage Disequilibrium Analysis or Haplotype Inference are conducted. However, by inputting information to distinguish groups, the analyses for each group will be possible.

Genotyping Data Import

It is possible to open multiple datasheets in SNPAlyze.

It is possible to open multiple datasheets in SNPAlyze.


Automatic selection of polymorphic markers

In Ver.5.0 or later, SNPAlyze can select the appropriate polymorphic markers automatically.

The automatic selection is filtered by Hardy-Weinberg equilibrium test(HWE), minor allele frequency(MAF), polymorphic marker types.

Automatic selection of polymorphic markers

SNPAlyze can provide the function that applying filtering for the polymorphic markers in three kinds of methods:
HWE, MAF and marker type. And, this filtering can apply to registered groups. For example, when the case group does not satisfy HWE due to genetic bias but the control group has to satisfy HWE, it is possible to apply filtering for control group only.


Case-Control Study

Tabulation method of genotype data

Genotype data can be modified for easy evaluation according to your preference. Four methods are available to tabulate genotype data. The first is termed "Automatically" because it automatically defines two types to perform statistical calculations and creates a contingency table as follows:

  1. Genotype model
  2. Allele model
  3. Recessive model
  4. Dominant model

The second is termed "User customize." It manually defines the contingency table to select polymorphic markers at will.

Case-Control Study
Case-Control Study

—————————————————–

Use of Chi-square test

SNPAlyze can evaluate Chi-square test and Fisher’s exact test for constructed contingency table. In case of 2×2 contingency table, odds ratio can also be calculated.

—————————————————–

Use of AIC

This software evaluates the relationship among individual SNPs and diseases by the chi-square test and AIC. With AIC, this evaluation can be performed with higher accuracy than with the chi-square test.

Independent and dependent analyses of the contingency table are performed from the AIC value of both models by assuming an independent model (AIC (IM)) and a dependent model (AIC (DM)) to create the contingency table as described in the previous section. Since a model that leads to minimum AIC values is the best,

AIC (IM) > AIC (DM) represents that SNP and a disease are dependent,

        while,

AIC (IM) < AIC (DM) represents that SNP and a disease are independent.

 

Exploitation of AIC

—————————————————–

Use of FDR

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

Use of FDR
In Hardy-Weinberg Equilibrium test, SNPAlyze perform multiple testing corrections using FDR.

—————————————————–

Output of detailed information

Detailed information containing analysis results and settings is available in text format.

Output of detail information


Linkage Disequilibrium Analysis

In this method, the Linkage Disequilibrium Analysis coefficient is calculated by using the difference between haplotype frequency and the allele frequency at two arbitrary gene loci. SNPAlyze can output the Linkage Disequilibrium Analysis coefficients such as D-value, D’-value, and r^2. In addition, the software can output chi-square and AIC values, and it can even display the graphical analysis result of the Linkage Disequilibrium Analysis.

Display of the haplotype frequency, LD coefficients, and statistics.

Display the Haplotype frequency, LD coefficient and Statistics.
—————————————————–

Graphical analysis result of LD analysis.

The LD coefficient and statistics between multiple SNPs can be seen at a glance. The area with a strong LD coefficient can be easily specified. The following three display settings are available in SNPAlyze:

  • Comparative display of the analysis result for two different groups. -Figure 1
  • Comparative display of the analysis result for two different LD coefficients and statistics. -Figure 2
  • Superimposed display of the analysis result for two different groups (LD map type of BMP only). -Figure 3



Superimpose display of the analysis result of two different groups.(Only case of LD Map type of BMP.)

In the case of the comparative display of the analysis result between two different groups or two different LD coefficients, the following grid type is also available. The LD coefficient or statistics is displayed on each cell and each cell can be color-coded according to the preset threshold value.

LDMap


Comparative display of the analysis result of two different LD coefficient and Statistics. Comparative display of the analysis result of two different groups.

Haplotype Inference

Estimate haplotype frequency & tagSNP selection

Haplotype candidates in a group and their frequency are calculated. In addition, you can obtain a Diplotype sample individually, which is concluded as maximum likelihood by EM algorithm.

Estimate haplotype frequency
—————————————————–

Estimation of diplotype distribution

SNPAlyze shows the diplotype distribution calculated during the process of haplotype frequency estimation by using the EM algorithm.

Estimation in distribution of Diplotype
—————————————————–

Output of detailed information

htSNP (tagSNP) combinations were displayed in a "Haplotype detail information" window. This window shows haplotype frequencies and diplotype information as well as htSNP combinations.

Estimation in distribution of Diplotype


Hardy-Weinberg Equilibrium Test

The differences between the actual allele number observed and the assumed allele number in the Hardy-Weinberg equilibrium at an SNP site are evaluated by the chi -square test. In addition, SNPAlyze can evaluate the Exact test and the Exact test (Monte Carlo simulation) to complement the case that chi -square test is unsuitable to test.

Hardy-Weinberg Equilibrium Test
—————————————————–

Output of detailed information

Detailed information containing analysis results and settings is available in text format.

Output of detail information
—————————————————–

Use of FDR

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

Use of FDR

In Hardy-Weinberg Equilibrium test, SNPAlyze perform multiple testing corrections using FDR.


Cochran-Armitage Trend TestNEW!

Cochran-Armitage Trend Test

Cochran-Armitage Trend Test is to investigate if genes associated with disease by means of comparison between two groups, one of which is a patient group and another is a non-patient group. This analysis assesses for the presence of a linear trend association between case-control category and allele counts.

—————————————————–

Definition of Contingency table

The distribution of case-control and genotype counts can be put in a 2 × 3 contingency table.

—————————————————–

Display of the results

The overall results and the statistical value of each group are shown in [Statistics] window and [Detail information] window. SNPAlyze outputs the result of the test as below, which are statistics (Chi-square, p-value, FDR q-value, and others) and information of loci.

Output of detail information
—————————————————–

Use of FDR

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

Use of FDR
In Cochran-Armitage Trend Test, SNPAlyze perform multiple testing corrections using FDR.


Use of Bootstrap method

Deviations from the estimated value and the confidence interval are calculated by resampling the data in order to estimate the statistical reliabilities of each of the following methods:
case-control studies, LD analysis and haplotype inferences.


Case-Control Haplotype Analysis

Differences in the haplotype frequency that can be estimated at arbitrary SNP sites among several groups are determined. The significance of this determination is evaluated by permutation tests.
This analysis provides each estimated haplotype frequency and also the permutation result by the EM algorithm.

Case-Control Haplotype Analysis

Moreover, graphs that show the frequency distribution of statistics and the frequency distribution acquired from permutation tests are output.

High-visualized graphs


Haplotype Block Analysis

In Ver.5.0 or later, SNPAlyze can execute "Haplotype Block Analysis." This analytic method can identify haplotype blocks by following two methods:
Gabriel method (Gabriel et al, science., 2002) *1
Four Gamete method (Wang et al, Am.J.Hum.Genet., 2002) *2

Haplotype Block Analysis

The output items is as below:
(1)Haplotype block candidate and frequency
(2)htSNP
(3)Frequency of appearance between two haplotypes
(4)LD co-efficient between two haplotype blocks(D’-value)
(5)p-value from chi-square test

Function enhancement of Haplotype Block Analysis

Furthermore, it is also possible to display the identified blocks visually.

Block1


Cooperate with HealthSketch

HealthSketch is a multivariate analysis tool for clinical and/or lifestyle data. The following functions are available by cooperating with HealthSketch.

* The following functions are available by the purchase of the appropriate version of HealthSketch.

Data passing between SNPAlyze and HealthSketch

Data passing between SNPAlyze and HealthSketch

  • Combinational analysis of DNA polymorphism and clinical and/or lifestyle data.
    SNPAlyze pass the diplotype configuration for each sample (judged to be maximum likelihood by the EM algorithm) to HealthSketch. HealethSketch can perform the analysis such as logistic regression by using the diplotype configuration.

  • Use of classification results by clustering using clinical information.
    According to the Clustering function of HealthSketch, sample data was classified by using clinical and/or lifestyle data. The classified data which have a similar clinical and/or lifestyle data makes it possible for effective DNA polymorphism analysis.

* SNPAlyze Ver.5.0.2 (or later) and HealthSketch Ver.1.1 (or later) are required for data passing function.

—————————————————–

Logistic Regression Analysis

SNPAlyze perform logistic regression analysis for each SNP. You can calculate Odds Ratio (OR), 95% Confidence Interval of the OR and p-value of likelihood ratio test for Dominant, Recessive and Genotype model about each SNP.

Logistic Regression Analysis

* SNPAlyze Ver.7.0 (or later) and HealthSketch Ver.2.5 (or later) are required for Logistic Regression Analysis.


Treat genotyping data and all analysis data collectively

SNPAlyze Data file includes genotyping data and all analysis data collectively. If you open a file that saved as this file format, the genotyping data and all analysis data will appear.

You can continue your analysis, or share the genotyping data and all analysis data by distributing this file to other SNPAlyze users. (Please mind this file include genotyping data)

Treat genotyping data and all analysis data collectively


Principal Component Analysis

  • Scatter plot is using Eigenvectors. The horizontal axis is first principal component and the vertical axis is second principal component.

    You can confirm samples with outliers.
  • Principal Component Analysis

     


    sa_squareManhattan plot

    • p-value is calculated from case-control study by using NGS Data, and the p-value is showed on Manhattan plot.
      In the lower part of the display, It is a statistical value of SNP. This value was selected on the Manhattan plot.
    • These values (p-value, chi-square, degrees of freedom and effect size) are calculated from Genotype, Allele, Recessive and Dominant models.

    Manhattan plot

     


    *1 Reference
    Stacey B. Gabriel, Stephen F. Schaffner, Huy Nguyen, Jamie M. Moore, Jessica Roy, Brendan Blumenstiel, John Higgins, Matthew DeFelice, Amy Lochner, Maura Faggart, Shau Neen Liu-Cordero, Charles Rotimi, Adebowale Adeyemo, Richard Cooper, Ryk Ward, Eric S, Lander, Mark J. Daly, David Altshuler, The Structure of Haplotype Blocks in the Human Genome. Science. 2002 Jun 21;296(5576):2225-9.

    *2 Reference
    Ning Wang, Joshua M. Akey, Kun Zhang, Ranajit Chakraborty, and Li Jin. Distribution of Recombination Crossovers and the Origin of Haplotype Blocks: Interplay of Population History, Recombination, and Mutation. Am J Hum Genet. 2002 Nov ;71 (5):1227-34.