## Tech Specs

The main functions of SNPAlyze will be introduced.

**Genotyping data import**

SNPAlyze is able to import the following specific types of Genotyping data files and analyze them.

**Available file types**

**Microsoft Excel files**(xls/xlsx format)**TSV**(Tab Separated Values)/**CSV**(Comma Separated Values) file**SNPAlyze Data file**(slyz format)**Biotage****PSQ96 export file****ABI****PRISM7900 export file**

When case-control studies are analyzed, in addition to the columns with genotyping data, extra columns for cases and controls are necessary in order to distinguish the groups.

It is not necessary to provide any extra columns to distinguish groups when Linkage Disequilibrium Analysis or Haplotype Inference are conducted. However, by inputting information to distinguish groups, the analyses for each group will be possible.

It is possible to open multiple datasheets in SNPAlyze.

**Automatic selection of polymorphic markers**

In Ver.5.0 or later, SNPAlyze can select the appropriate polymorphic markers automatically.

The automatic selection is filtered by Hardy-Weinberg equilibrium test(HWE), minor allele frequency(MAF), polymorphic marker types.

SNPAlyze can provide the function that applying filtering for the polymorphic markers in three kinds of methods:

HWE, MAF and marker type. And, this filtering can apply to registered groups. For example, when the case group does not satisfy HWE due to genetic bias but the control group has to satisfy HWE, it is possible to apply filtering for control group only.

**Case-Control Study**

**Tabulation method of genotype data**

Genotype data can be modified for easy evaluation according to your preference. Four methods are available to tabulate genotype data. The first is termed "Automatically" because it automatically defines two types to perform statistical calculations and creates a contingency table as follows:

**Genotype model****Allele model****Recessive model****Dominant model**

The second is termed "User customize." It manually defines the contingency table to select polymorphic markers at will.

—————————————————–

**Use of Chi-square test**

SNPAlyze can evaluate Chi-square test and Fisher’s exact test for constructed contingency table. In case of 2×2 contingency table, odds ratio can also be calculated.

—————————————————–

**Use of AIC**

This software evaluates the relationship among individual SNPs and diseases by the chi-square test and AIC. With AIC, this evaluation can be performed with higher accuracy than with the chi-square test.

Independent and dependent analyses of the contingency table are performed from the AIC value of both models by assuming an independent model (AIC (IM)) and a dependent model (AIC (DM)) to create the contingency table as described in the previous section. Since a model that leads to minimum AIC values is the best,

AIC (IM) > AIC (DM) represents that SNP and a disease are dependent, |

while,

AIC (IM) < AIC (DM) represents that SNP and a disease are independent. |

—————————————————–

**Use of FDR**

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.

The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

In Hardy-Weinberg Equilibrium test, SNPAlyze perform multiple testing corrections using FDR.

—————————————————–

**Output of detailed information**

Detailed information containing analysis results and settings is available in text format.

**Linkage Disequilibrium Analysis**

In this method, the Linkage Disequilibrium Analysis coefficient is calculated by using the difference between haplotype frequency and the allele frequency at two arbitrary gene loci. SNPAlyze can output the Linkage Disequilibrium Analysis coefficients such as D-value, D’-value, and r^2. In addition, the software can output chi-square and AIC values, and it can even display the graphical analysis result of the Linkage Disequilibrium Analysis.

**Display of the haplotype frequency, LD coefficients, and statistics.**

—————————————————–

**Graphical analysis result of LD analysis.
**The LD coefficient and statistics between multiple SNPs can be seen at a glance. The area with a strong LD coefficient can be easily specified. The following three display settings are available in SNPAlyze:

- Comparative display of the analysis result for two different groups.
**-Figure 1** - Comparative display of the analysis result for two different LD coefficients and statistics.
**-Figure 2** - Superimposed display of the analysis result for two different groups (LD map type of BMP only).
**-Figure 3**

In the case of the comparative display of the analysis result between two different groups or two different LD coefficients, the following grid type is also available. The LD coefficient or statistics is displayed on each cell and each cell can be color-coded according to the preset threshold value.

**Haplotype Inference**

**Estimate haplotype frequency & tagSNP selection**

Haplotype candidates in a group and their frequency are calculated. In addition, you can obtain a Diplotype sample individually, which is concluded as maximum likelihood by EM algorithm.

—————————————————–

**Estimation of diplotype distribution**

SNPAlyze shows the diplotype distribution calculated during the process of haplotype frequency estimation by using the EM algorithm.

—————————————————–

**Output of detailed information**

htSNP (tagSNP) combinations were displayed in a "Haplotype detail information" window. This window shows haplotype frequencies and diplotype information as well as htSNP combinations.

**Hardy-Weinberg Equilibrium Test**

The differences between the actual allele number observed and the assumed allele number in the Hardy-Weinberg equilibrium at an SNP site are evaluated by the chi -square test. In addition, SNPAlyze can evaluate the Exact test and the Exact test (Monte Carlo simulation) to complement the case that chi -square test is unsuitable to test.

—————————————————–

**Output of detailed information**

Detailed information containing analysis results and settings is available in text format.

—————————————————–

**Use of FDR**

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.

The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

In Hardy-Weinberg Equilibrium test, SNPAlyze perform multiple testing corrections using FDR.

**Cochran-Armitage Trend Test**

Cochran-Armitage Trend Test is to investigate if genes associated with disease by means of comparison between two groups, one of which is a patient group and another is a non-patient group. This analysis assesses for the presence of a linear trend association between case-control category and allele counts.

—————————————————–

**Definition of Contingency table**

The distribution of case-control and genotype counts can be put in a 2 × 3 contingency table.

—————————————————–

**Display of the results**

The overall results and the statistical value of each group are shown in [Statistics] window and [Detail information] window. SNPAlyze outputs the result of the test as below, which are statistics (Chi-square, p-value, FDR q-value, and others) and information of loci.

—————————————————–

**Use of FDR**

In case-control studies, SNPAlyze perform multiple testing corrections using FDR.

The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)

In Cochran-Armitage Trend Test, SNPAlyze perform multiple testing corrections using FDR.

**Use of Bootstrap method**

Deviations from the estimated value and the confidence interval are calculated by resampling the data in order to estimate the statistical reliabilities of each of the following methods:

**case-control studies**, **LD analysis** and **haplotype inferences**.

**Case-Control Haplotype Analysis**

Differences in the haplotype frequency that can be estimated at arbitrary SNP sites among several groups are determined. The significance of this determination is evaluated by permutation tests.

This analysis provides each estimated haplotype frequency and also the permutation result by the EM algorithm.

Moreover, graphs that show the frequency distribution of statistics and the frequency distribution acquired from permutation tests are output.

**Haplotype Block Analysis**

In Ver.5.0 or later, SNPAlyze can execute "Haplotype Block Analysis." This analytic method can identify haplotype blocks by following two methods:

**Gabriel method** (Gabriel et al, science., 2002) ***1**

**Four Gamete method** (Wang et al, Am.J.Hum.Genet., 2002) ***2**

The output items is as below:

(1)Haplotype block candidate and frequency

(2)htSNP

(3)Frequency of appearance between two haplotypes

(4)LD co-efficient between two haplotype blocks(D’-value)

(5)p-value from chi-square test

Furthermore, it is also possible to display the identified blocks visually.

**Cooperate with HealthSketch**

**HealthSketch** is a multivariate analysis tool for clinical and/or lifestyle data. The following functions are available by cooperating with HealthSketch.

* The following functions are available by the purchase of the appropriate version of HealthSketch.

**Data passing between SNPAlyze and HealthSketch**

**Combinational analysis of DNA polymorphism and clinical and/or lifestyle data.**

SNPAlyze pass the diplotype configuration for each sample (judged to be maximum likelihood by the EM algorithm) to HealthSketch. HealethSketch can perform the analysis such as logistic regression by using the diplotype configuration.

**Use of classification results by clustering using clinical information.**

According to the Clustering function of HealthSketch, sample data was classified by using clinical and/or lifestyle data. The classified data which have a similar clinical and/or lifestyle data makes it possible for effective DNA polymorphism analysis.

**data passing function**.

—————————————————–

**Logistic Regression Analysis**

SNPAlyze perform logistic regression analysis for each SNP. You can calculate Odds Ratio (OR), 95% Confidence Interval of the OR and p-value of likelihood ratio test for Dominant, Recessive and Genotype model about each SNP.

* SNPAlyze Ver.7.0 (or later) and HealthSketch Ver.2.5 (or later) are required for

**Logistic Regression Analysis**.

**Treat genotyping data and all analysis data collectively**

SNPAlyze Data file includes genotyping data and all analysis data collectively. If you open a file that saved as this file format, the genotyping data and all analysis data will appear.

You can continue your analysis, or share the genotyping data and all analysis data by distributing this file to other SNPAlyze users. **(Please mind this file include genotyping data)**

- Scatter plot is using Eigenvectors. The horizontal axis is first principal component and the vertical axis is second principal component.

You can confirm samples with outliers.

**Manhattan plot**

- p-value is calculated from case-control study by using NGS Data, and the p-value is showed on Manhattan plot.

In the lower part of the display, It is a statistical value of SNP. This value was selected on the Manhattan plot. - These values (p-value, chi-square, degrees of freedom and effect size) are calculated from Genotype, Allele, Recessive and Dominant models.

***1** Reference

Stacey B. Gabriel, Stephen F. Schaffner, Huy Nguyen, Jamie M. Moore, Jessica Roy, Brendan Blumenstiel, John Higgins, Matthew DeFelice, Amy Lochner, Maura Faggart, Shau Neen Liu-Cordero, Charles Rotimi, Adebowale Adeyemo, Richard Cooper, Ryk Ward, Eric S, Lander, Mark J. Daly, David Altshuler, The Structure of Haplotype Blocks in the Human Genome. Science. 2002 Jun 21;296(5576):2225-9.

***2** Reference

Ning Wang, Joshua M. Akey, Kun Zhang, Ranajit Chakraborty, and Li Jin. Distribution of Recombination Crossovers and the Origin of Haplotype Blocks: Interplay of Population History, Recombination, and Mutation. Am J Hum Genet. 2002 Nov ;71 (5):1227-34.