In recent times, the widespread use of SNP and Microsatellite in both industrial and academic research has led to a growing demand for genotyping platforms. In addition, the high performance of genotyping technology creates large quantities of SNP and Microsatellite data. As a consequence, a powerful software capable of performing association studies between a massive amount of genotyping data and clinical information is required.
Dynacom experts developed SNPAlyze for obtaining genetic marker SNPs of genetic diseases and identifying disease susceptibility gene(s) and drug susceptibility gene(s) by evaluating their Statistical significance from SNP and Microsatellite data. It also offers a wide variety of options including the massive data processing function. Further, this software is capable of analyzing genotyping data files in various formats and displaying them graphically.
Overview
From data import to various analyses carried out with user friendly functionalities.
SNPAlyze is an efficient data mining software that extracts useful information from a massive amount of genotyping data.
This software can be used for various analyses such as Case-Control Study, Cochran-Armitage Trend Test, Linkage Disequilibrium (LD) Analysis, Haplotype Inference, Hardy-Weinberg Equilibrium test, Case-Control Haplotype Analysis, Haplotype Block Analysis, and Logistic Regression Analysis.
Besides, this software can eliminates most of ambiguity so that it achieves to obtain much more precise analyses by introducing Akaike’s information criterion (AIC)*.
SNPAlyze data analysis flow chart
The data analysis flowchart is given below.
* Akaike’s information criterion (AIC)
AIC values are the criteria that indicate the degree to which the observed data corresponds to a model. The residual sum of squares (RSS) becomes smaller as the number of parameters included in a model increases.
Hence, SNPAlyze not only compares the size of the RSS but also considers the number of parameters. Consequently, a model that leads to the minimum AIC is considered the best.
AIC = -2 x (maximum log likelihood of the model) + 2 x (number of free parameters in the model)
The first term is called the maximum likelihood of the model AIC and is a measure of how well the model fits the data. The second term is called the number of free parameters in the AIC of the model and indicates the penalty associated with the addition of parameters (and hence model complexity).
The determination reliability becomes higher as the absolute value of the AIC (in this case, it indicates the difference between a dependent and an independent model) increases. The absolute value of the AIC that is close to zero is considered equivalent to the 5% level of significance in the chi-square test, although this evaluation depends on the degree of freedom in the contingency table.
Reference:Sakamoto Y. and Akaike H.(1978) Analysis of Cross Classified Data by AIC, Ann. Inst. Statist. Math., 30-1, pp.185-197.
Features
High user friendliness-similar to Excel
All data can be easily edited, similar to Excel. You can also use drag, copy, cut, and paste functions to transfer data directly from the Excel data sheet. The edited data can be saved as a text file (TSV format).
Easy data import
A TSV (Tab Separated Values)/CSV (Comma Separated Values) file, an Excel file (xls/xlsx format) and SNPAlyze data file (slyz format) can be easily imported by following the directions on the screen. In addition, other exported files such as Biotage PSQ96 or ABI PRISM7900 can be imported.
The world’s fastest haplotype inference function *1
The Haplotype Inference function of SNPAlyze is the fastest in the world-about 1000 times Arlequin. *2
Multilocus haplotype inference
Ver. 5.0 Pro can analyze 40 SNPs at once. (In Standard, the maximum number of SNPs that can be analyzed at once is 30.)
Haplotype Inference depends on the number of samples; the haplotype of 40 SNPs can be analyzed in several seconds.
Supports AIC evaluation besides chi-square and P values
SNPAlyze includes additional useful functions that calculate not only conventional chi-square or P values but also AIC values. Generally, a chi-square test contains “ambiguity” since significant levels are selected arbitrarily by a user.
However, the AIC eliminates the “ambiguity” observed in the chi-square test and can provide analytical results with much higher accuracy. SNPAlyze utilizes the AIC function for the following analyses:
- Case-Control Study
- Linkage Disequilibrium Analysis
- Case-Control Haplotype Analysis
Graphical display of Linkage Disequilibrium
The LD coefficient between multiple SNPs can be seen at a glance. The area with a strong LD coefficient can be easily specified.
- Comparative display of the analysis results for two different groups
- Comparative display of the analysis result for two different LD coefficient and statistics
- Superimposed display of the analysis result for two different groups
(LD map type of BMP only)
Supports LD coefficients calculated by Akaike’s information criterion (AIC) besides conventional LD coefficients (D, D’, r^2)
D, D’, r^2 are known as criteria of Linkage Disequilibrium. SNPAlyze can evaluate Linkage Disequilibrium with AIC added to conventional criteria.*3
By considering a model in which two SNP loci are assumed to be in LD (AIC (IM)) and another model in which two SNP loci are assumed to be in linkage equilibrium (AIC (DM)), SNPAlyze evaluates their differences. This evaluation is based on the following equation:
(AIC(LD)= AIC(IM)- IC(DM))
Estimation of diplotype distribution
Diplotype distribution can be estimated by using the EM algorithm. The combination of haplotypes that constitute each diplotype and the number of samples that correspond to identical diplotypes are displayed.
Use of permutation tests
SNPAlyze calculates the difference between the haplotype frequencies of multiple groups such as case groups or control groups. However, a large error may occur in the chi-square test when the haplotype frequency is extremely small.
Permutation tests are one of the methods that can effectively avoid this problem by identifying an appropriate P value in a wide range of data. By using random numbers, this method arrives at an exact probability value by approximation.*4
Processability of a massive amount of data (Only the Pro version)
A maximum of 10,000 samples can be analyzed. The analysis of a massive data that cannot be processed in the Standard version is possible. The number of data items that can be analyzed depends on the computer memory.
Supports microsatellite data
Compared with SNPs, the frequency of appearance of microsatellite in whole genome is low, while microsatellite polymorphisms are very diverse and contain much more information.
Therefore, efficient analysis is possible by combining SNP and microsatellite data.
htSNP identification
Identification of htSNP (tagSNP) is now possible. Efficient genotyping is possible by utilizing the htSNP that represents a haplotype block.
If you perform multilocus haplotype inference, two or more combination of htSNP may exist. SNPAlyze output all possible sets.
Multiple open of data-sheets
In Ver.5.0, Multiple datasheets can open in SNPAlyze. This function provides the following techniques:
Comparing the analysis results among different datasheets.
Referring the analysis results when if you want to check.
Haplotype Block Analysis
SNPAlyze can construct “Haplotype Block” by following two methods:
Gabriel method (Gabriel et al, science., 2002) *5
Four Gamete method (Wang et al, Am.J.Hum.Genet., 2002) *6
And, it is possible to run the Case-Control Haplotype Analysis directly on constructed haplotype blocks.
Automatic selection of polymorphic markers
In Ver.5.0, SNPAlyze can select the appropriate polymorphic markers automatically. The automatic selection is filtered by Hardy-Weinberg Equilibrium test (HWE), minor allele frequency (MAF) and polymorphic marker types.
Cooperate with HealthSketch *7
HealthSketch is a multivariate analysis tool for clinical and/or lifestyle data. The following functions are available by cooperating with HealthSketch.
- Data passing between SNPAlyze and HealthSketch *7
- Combinational analysis of DNA polymorphism and clinical and/or lifestyle data.
- Use of classification result by clustering using clinical information
- Logistic Regression Analysis *8
Allows treating of genotyping data and all analysis data collectively
SNPAlyze Data file includes genotyping data and all analysis data collectively. If you open a file that saved as this file format, the genotyping data and all analysis data will appear. You can continue your analysis, or share the genotyping data and all analysis data by distributing this file to other SNPAlyze users. (Please mind this file include genotyping data)
Use of FDR
In case-control studies, SNPAlyze perform multiple testing corrections using FDR. The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)
Cochran-Armitage Trend Test
SNPAlyze performe Cochran-Armitage Trend Test for Dominant, Recessive and Genotype model about each SNP.
Cochran-Armitage Trend Test is to investigate if genes associated with disease by means of comparison between two groups, one of which is a patient group and another is a non-patient group. This analysis assesses for the presence of a linear trend association between case-control category and allele counts.
VCF file import
- VCF file import
You can use the VCF file as input file (Support VCF ver4.1 and 4.2 ).
However, Importable number of samples is different, in Standard and Pro version.
Please see Product comparison between Standard & Pro Version for more information.
Effect size
- Effect size is refers to the magnitude of effect of statistical test, there is such as “Standardized difference between two groups” and “Correlation measures of effect size”.
The larger the absolute value, it indicates that effect is large. For example, correlation measures of effect size is phi(Φ) and Cramer’s V(V). - The extent of the relationship indexes between two variables (2 x 2) is using the chi-square test.
SNPAlyze calculates the effect size from each contingency table of 4 genetic models: Genotype, Allele, Recessive and Dominant.
Effect size can be use the “Chi-square value(χ2)” “The total number of case group and the control group (N)” and “The number of rows or columns of the lesser of contingency table (k)” expressed by the following equation.
*1 As of September, 2004 (its company investigation)
*2 Limit to SNP data.
*3 Shimo-Onoda K, Tanaka T, Furushima K, Nakajima T, Toh S, Harata S, Yone K, Komiya S, Adachi H, Nakamura E, Fujimiya H, Inoue I. Akaike’s information criterion for a measure of linkage disequilibrium. J Hum Genet 2002; 47(12): 649-55.
*4 Fallin D, Cohen A, Essioux L, Chumakov I, Blumenfeld M, Cohen D, Schork NJ. Genetic analysis of case/ control data using estimated haplotype frequencies: application to APOE locus variation and alzheimer’s disease. Genome Res. 2001 Jan; 11: 143-151.
Good, P. Permutation Tests. A Practical Guide to Resampling Methods for Test-ing Hypothesis. Second Edition. New York: Springer-Verlag, 2000.
*5 Stacey B. Gabriel, Stephen F. Schaffner, Huy Nguyen, Jamie M. Moore, Jessica Roy, Brendan Blumenstiel, John Higgins, Matthew DeFelice, Amy Lochner, Maura Faggart, Shau Neen Liu-Cordero, Charles Rotimi, Adebowale Adeyemo, Richard Cooper, Ryk Ward, Eric S, Lander, Mark J. Daly, David Altshuler, The Structure of Haplotype Blocks in the Human Genome. Science. 2002 Jun 21;296(5576):2225-9.
*6 Ning Wang, Joshua M. Akey, Kun Zhang, Ranajit Chakraborty, and Li Jin. Distribution of Recombination Crossovers and the Origin of Haplotype Blocks: Interplay of Population History, Recombination, and Mutation Am J Hum Genet. 2002 Nov ;71 (5):1227-34.
*7 SNPAlyze Ver.5.0.2 (or later) and HealthSketch Ver.1.1 (or later) are required for data passing function.
*8 SNPAlyze Ver.7.0 (or later) and HealthSketch Ver.2.5 (or later) are required for Logistic Regression Analysis.
Tech specs
Genotyping data import
SNPAlyze is able to import the following specific types of Genotyping data files and analyze them.
Available file types
- Microsoft Excel files (xls/xlsx format)
- TSV (Tab Separated Values)/CSV (Comma Separated Values) file
- SNPAlyze Data file (slyz format)
- Biotage PSQ96 export file
- ABI PRISM7900 export file
When case-control studies are analyzed, in addition to the columns with genotyping data, extra columns for cases and controls are necessary in order to distinguish the groups.
It is not necessary to provide any extra columns to distinguish groups when Linkage Disequilibrium Analysis or Haplotype Inference are conducted. However, by inputting information to distinguish groups, the analyses for each group will be possible.
Automatic selection of polymorphic markers
In Ver.5.0 or later, SNPAlyze can select the appropriate polymorphic markers automatically.
The automatic selection is filtered by Hardy-Weinberg equilibrium test(HWE), minor allele frequency(MAF), polymorphic marker types.
SNPAlyze can provide the function that applying filtering for the polymorphic markers in three kinds of methods:
HWE, MAF and marker type. And, this filtering can apply to registered groups. For example, when the case group does not satisfy HWE due to genetic bias but the control group has to satisfy HWE, it is possible to apply filtering for control group only.
Case-Control Study
Tabulation method of genotype data
Genotype data can be modified for easy evaluation according to your preference. Four methods are available to tabulate genotype data. The first is termed “Automatically” because it automatically defines two types to perform statistical calculations and creates a contingency table as follows:
- Genotype model
- Allele model
- Recessive model
- Dominant model
The second is termed “User customize.” It manually defines the contingency table to select polymorphic markers at will.
Use of Chi-square test
SNPAlyze can evaluate Chi-square test and Fisher’s exact test for constructed contingency table. In case of 2×2 contingency table, odds ratio can also be calculated.
Use of AIC
This software evaluates the relationship among individual SNPs and diseases by the chi-square test and AIC. With AIC, this evaluation can be performed with higher accuracy than with the chi-square test.
Independent and dependent analyses of the contingency table are performed from the AIC value of both models by assuming an independent model (AIC (IM)) and a dependent model (AIC (DM)) to create the contingency table as described in the previous section. Since a model that leads to minimum AIC values is the best,
AIC(IM) > AIC(DM) represents that SNP and a disease are dependent,
AIC(IM) < AIC(DM) represents that SNP and a disease are independent.
Use of FDR
In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)
In Hardy-Weinberg Equilibrium test, SNPAlyze perform multiple testing corrections using FDR.
Output of detailed information
Detailed information containing analysis results and settings is available in text format.
Linkage Disequilibrium Analysis
In this method, the Linkage Disequilibrium Analysis coefficient is calculated by using the difference between haplotype frequency and the allele frequency at two arbitrary gene loci. SNPAlyze can output the Linkage Disequilibrium Analysis coefficients such as D-value, D’-value, and r^2. In addition, the software can output chi-square and AIC values, and it can even display the graphical analysis result of the Linkage Disequilibrium Analysis.
Display of the haplotype frequency, LD coefficients, and statistics.
Display of the haplotype frequency, LD coefficients, and statistics.
The LD coefficient and statistics between multiple SNPs can be seen at a glance. The area with a strong LD coefficient can be easily specified. The following three display settings are available in SNPAlyze:
- Comparative display of the analysis result for two different groups. -Figure 1
- Comparative display of the analysis result for two different LD coefficients and statistics. -Figure 2
- Superimposed display of the analysis result for two different groups (LD map type of BMP only). -Figure 3
In the case of the comparative display of the analysis result between two different groups or two different LD coefficients, the following grid type is also available. The LD coefficient or statistics is displayed on each cell and each cell can be color-coded according to the preset threshold value.
Haplotype Inference
Estimate haplotype frequency & tagSNP selection
Haplotype candidates in a group and their frequency are calculated. In addition, you can obtain a Diplotype sample individually, which is concluded as maximum likelihood by EM algorithm.
Estimation of diplotype distribution
SNPAlyze shows the diplotype distribution calculated during the process of haplotype frequency estimation by using the EM algorithm.
Output of detailed information
htSNP (tagSNP) combinations were displayed in a “Haplotype detail information” window. This window shows haplotype frequencies and diplotype information as well as htSNP combinations.
Hardy-Weinberg Equilibrium Test
The differences between the actual allele number observed and the assumed allele number in the Hardy-Weinberg equilibrium at an SNP site are evaluated by the chi -square test. In addition, SNPAlyze can evaluate the Exact test and the Exact test (Monte Carlo simulation) to complement the case that chi -square test is unsuitable to test.
Output of detailed information
Detailed information containing analysis results and settings is available in text format.
Use of FDR
In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)
In Hardy-Weinberg Equilibrium test, SNPAlyze perform multiple testing corrections using FDR.
Cochran-Armitage Trend Test
Cochran-Armitage Trend Test
Cochran-Armitage Trend Test is to investigate if genes associated with disease by means of comparison between two groups, one of which is a patient group and another is a non-patient group. This analysis assesses for the presence of a linear trend association between case-control category and allele counts.
Definition of Contingency table
The distribution of case-control and genotype counts can be put in a 2 × 3 contingency table.
Display of the results
The overall results and the statistical value of each group are shown in [Statistics] window and [Detail information] window. SNPAlyze outputs the result of the test as below, which are statistics (Chi-square, p-value, FDR q-value, and others) and information of loci.
Use of FDR
In case-control studies, SNPAlyze perform multiple testing corrections using FDR.
The FDR controls the proportion of errors among test results that null hypothesis were rejected. SNPAlyze calculate q-values on the basis of the distribution of p-values. (BH or Bootstrap method is available)
Use of Bootstrap method
Deviations from the estimated value and the confidence interval are calculated by resampling the data in order to estimate the statistical reliabilities of each of the following methods:
case-control studies, LD analysis and haplotype inferences.
Case-Control Haplotype Analysis
Differences in the haplotype frequency that can be estimated at arbitrary SNP sites among several groups are determined. The significance of this determination is evaluated by permutation tests.
This analysis provides each estimated haplotype frequency and also the permutation result by the EM algorithm.
Moreover, graphs that show the frequency distribution of statistics and the frequency distribution acquired from permutation tests are output.
Haplotype Block Analysis
In Ver.5.0 or later, SNPAlyze can execute “Haplotype Block Analysis.” This analytic method can identify haplotype blocks by following two methods:
Gabriel method (Gabriel et al, science., 2002) *1
Four Gamete method (Wang et al, Am.J.Hum.Genet., 2002) *2
The output items is as below:
- Haplotype block candidate and frequency
- htSNP
- Frequency of appearance between two haplotypes
- LD co-efficient between two haplotype blocks(D’-value)
- p-value from chi-square test
Furthermore, it is also possible to display the identified blocks visually.
*1 Stacey B. Gabriel, Stephen F. Schaffner, Huy Nguyen, Jamie M. Moore, Jessica Roy, Brendan Blumenstiel, John Higgins, Matthew DeFelice, Amy Lochner, Maura Faggart, Shau Neen Liu-Cordero, Charles Rotimi, Adebowale Adeyemo, Richard Cooper, Ryk Ward, Eric S, Lander, Mark J. Daly, David Altshuler,
The Structure of Haplotype Blocks in the Human Genome. Science. 2002 Jun 21;296(5576):2225-9.
*2 Ning Wang, Joshua M. Akey, Kun Zhang, Ranajit Chakraborty, and Li Jin. Distribution of Recombination Crossovers and the Origin of Haplotype Blocks: Interplay of Population History, Recombination, and Mutation Am J Hum Genet. 2002 Nov ;71 (5):1227-34.
Cooperate with HealthSketch
HealthSketch is a multivariate analysis tool for clinical and/or lifestyle data. The following functions are available by cooperating with HealthSketch.
* The following functions are available by the purchase of the appropriate version of HealthSketch.
Data passing between SNPAlyze and HealthSketch
- Combinational analysis of DNA polymorphism and clinical and/or lifestyle data.
SNPAlyze pass the diplotype configuration for each sample (judged to be maximum likelihood by the EM algorithm) to HealthSketch. HealethSketch can perform the analysis such as logistic regression by using the diplotype configuration. - Use of classification results by clustering using clinical information.
According to the Clustering function of HealthSketch, sample data was classified by using clinical and/or lifestyle data. The classified data which have a similar clinical and/or lifestyle data makes it possible for effective DNA polymorphism analysis.
*SNPAlyze Ver.5.0.2 (or later) and HealthSketch Ver.1.1 (or later) are required for data passing function.
Logistic Regression Analysis
SNPAlyze perform logistic regression analysis for each SNP. You can calculate Odds Ratio (OR), 95% Confidence Interval of the OR and p-value of likelihood ratio test for Dominant, Recessive and Genotype model about each SNP.
*SNPAlyze Ver.7.0 (or later) and HealthSketch Ver.2.5 (or later) are required for Logistic Regression Analysis.
Treat genotyping data and all analysis data collectively
SNPAlyze Data file includes genotyping data and all analysis data collectively. If you open a file that saved as this file format, the genotyping data and all analysis data will appear.
You can continue your analysis, or share the genotyping data and all analysis data by distributing this file to other SNPAlyze users. (Please mind this file include genotyping data)
Principal Component Analysis
- Scatter plot is using Eigenvectors. The horizontal axis is first principal component and the vertical axis is second principal component.
You can confirm samples with outliers.
Manhattan plot
- p-value is calculated from case-control study by using NGS Data, and the p-value is showed on Manhattan plot.
In the lower part of the display, It is a statistical value of SNP. This value was selected on the Manhattan plot. - These values (p-value, chi-square, degrees of freedom and effect size) are calculated from Genotype, Allele, Recessive and Dominant models.
System requirements
Item | Detail | |
---|---|---|
Standard | Pro | |
OS | Windows 8.1/10 | |
RAM | 4GB or more | |
Disk space | 10GB or more *1 | |
Other | CD-ROM drive (for software installation) USB port Interface *2 |
*1 The C: drive requires free space corresponding to the analyze data.
*2 User management is conducted by using a protection key on this product. Therefore, one USB port that connects the protection key is necessary at the time of software execution.
Product grade comparison
Major functions | Products comparison | ||
---|---|---|---|
Standard | Pro | ||
NGS Data Analysis | Supported import samples (VCF file) | 1,000 samples | 5,000 samples |
Principal Component Analysis | 50 samples | 3,000 samples | |
Case-Control Study | 50 samples | 3,000 samples | |
SNP and Haplotype Analysis | Supported import samples | 1,000 samples | 10,000 samples |
Hardy Weinberg Equilibrium Test | 2,560 loci | ||
Case-Control Study | |||
Cochran-Armitage Trend Test | |||
Logistic Regression Analysis *1 | |||
Linkage Disequilibrium Analysis | 1,000 loci | ||
Haplotype Inference *2 | 30 loci | 40 loci | |
Haplotype Block Analysis | 1,000 loci | ||
Case-Control Haplotype Analysis | 10 loci | ||
Bootstrap *3 | |||
htSNP identification | |||
Microsatellite Data *4 | |||
Open and Save SNPAlyze Data file |
Samples and loci are the maximum number to be imported.
*1 HealthSketch Ver.2.5 (or later) is needed to be installed in order to use this function.
*2 With this product, EM algorithm is applied for all haplotype candidates. Maximum number of loci which can be analysed at once in Standard is 30 and in Pro is 40.
*3 Bootstrap can be used for Case Control Study,Hardy Weinberg Equilibrium Test,Haplotype Inference,Linkage Disequilibrium Analysis and Cochran-Armitage Trend Test.
*4 Haplotype Block Analysis is not compatible with Microsatellite Data.
Free trial
Related papers
Update
- New protection key driver for Windows Vista released (April 1, 2008)
- New protection key driver for Windows Vista released (February 01, 2007)
Support
Thank you very much for choosing Dynacom’s Software. We are pleased to offer you complimentary support for one year following your purchase. Should you have any questions or require assistance, please do not hesitate to contact our support center as outlined below. For inquiries via email, please click here.
Developer
DYNACOM Co.,Ltd.
World Business Garden, Marive East 25F
2-6-1, Nakase, Mihama-ku, Chiba-shi,
Chiba, 261-7125, Japan
TEL: +81-43-213-8131
FAX: +81-43-213-8132