Animal Trait Correlation Database
Frequently Asked Questions
  1. What is correlation? How is it measured?
  2. What is variance? What are those various kinds of variances?
  3. What is heritability?
  4. Phenotypic variances are sometimes reported as genetic variances + environmental variances, and sometimes reported as genetic variances + residual variances. How this sort of data reports are consistantly recorded in the CorrDB?
  5. What is genomic heritability? SNP-based heritability?
  6. Is there a way to locate a correlation information using the CorrID?
  1. What is correlation? How is it measured?

  2. Correlation is a statistical method that can show whether and how strongly changes of pairs of variables (such measurements of animal traits) are related.

    Correlation Coefficient (r) is a statistical parameter that describes the degree as how closely the pairs of variables are related.

    R-square: The square of the coefficient (, also known as "coefficient of determination") is equal to the percent of the variation in one variable that is related to the variation in the other,
    = Explained variation / Total variation
    While correlation coefficients () are normally reported as a value between -1 and +1, r-square is always between 0 and 100%. E.g. After squaring r, ignore the decimal point. An r of .5 means 25% of the variation is related (.5 squared =.25). An r value of .7 means 49% of the variance is related (.7 squared = .49).

    For genetic analysis, the geneticists partition the correlation into phenotypic correlations and genetic correlations. The phenotypic correlation is the correlation between records of two traits on the same animal and is usually estimated by the product-moment correlation statistic (or Pearson correlation coefficient, for short).

    The genetic correlation is the correlation between an animal's genetic value for one trait and the same animal's genetic value for the other trait.  

  3. What is variance? What are those various kinds of variances?

  4. In order to correctly estimate the trait correlations, it is necessary to understand the trait measurement variations of a trait (NB: singular, 1 trait). Statistically the trait variation is measured by variance. The variance is a numerical measure of how the data values is dispersed around the mean. In particular, the sample variance is defined as:
    Standard Deviation (SD) is the square root of the variance. As a measure of spread, given the mean and SD of a normal distribution, it is possible to compute the percentile rank associated with a score.

       Variance Components of a Quantitative Trait in the eyes of geneticists:
    Phenotypic variance is simply the observed, measured variance in a trait. Its estimates is the sum of total genetic variance, non-genetic variance, and possibily the interactions of the two factors.
    VP = VG + VE + VGE
    where VP = total phenotypic variation
    VG = total genetic factor variation
    VE = total environmental factor variation
    VGE = genetic X environmental factor interaction variation

    Genetic variance = additive genetic variance
    + dominant genetic variance
    + epestatic genetic variance
    + interaction between/among all previous genetic variances

    Non-genetic variance = variances due to environmental factors + Error.

       Sources of Genetic Variations:
    Genetic variations may come from Additive Genetic Variations (VA), Dominance Variations (VD), and Epistatic Variations, or Interaction Genetic Variations (VI). VD and VI are called Non-Additive Genetic Variations. Thus:

    VG = VA + VD + VI

    ∴  VP = VA + VD + VI + VE + VGE

       Variance Components of a Quantitative Trait in the eyes of statisticians:
    Residual plot and Graphical representation of r-squares
    Residual is a statistical concept, representing the difference between the observed value of the dependent variable (y) and the predicted value (ŷ) (Residual = Observed – Predicted). The Residual Sum of Square (RSS), also called the sum of squared errors of prediction (SSE), is a measure of the discrepancy between the data and an estimation model.
    (Graph modified from Machine Learning Plus)
    Residual variance is also a statistical concept, representing un-explained variations (versus explained variations attributable to additive, dominance, or epistatic genetic variations).

    In classical genetic analysis, the residual variance is often conveniently used to represent environmental variations, referring to "everything else" after the explained variations.

    It is worth to note that, in a more resent study, Huang and Mackay (2016) showed evidences to indicate that variance component analysis should not be used to infer genetic architecture of quantitative traits.  

  5. What is heritability?

  6. Broad sense heritability gives the proportion of observed variation (VP) that can be attributed to genetic reasons (as opposed to the environment), i.e. total genetic variance / phenotypic variance ratio:
    H2 = VG / VP
    This is called heritability in the broad sense because it is a rather crude measure that includes reasons for the genetic variation that are not necessarily passed on to the next generation.

    Narrow sense heritability gives the ratio of additive genetic variance/ phenotypic variance:
    Graph modified from Nature Education

    h2 = VA / VP

    The reason why the additive genetic variance matters here is because what's passed on to the next generation are only the alleles (NOT the dominance interaction NOR the epistatic interaction). The allele sets to be passed on are formed newly at each generation. For example, at generation one, some offspring may have alleles A1/A3 and B2/B4. They are new combinations not seen in either parent, therefore the dominance and epistatic interactions will be new. In general, greater the additive genetic variability VA in a population, greater the diversity it, thus greater selection potentials (greater the narrow-sense heritability);
  7. Phenotypic variances are sometimes reported as genetic variances + environmental variances, and sometimes reported as genetic variances + residual variances. How data reported like these are curated in order to be consistent?

  8. There could have been a confusion between "environmental veriance" and "residual variance" as they both serve as "the other", or "everything else", less important variance component when study focus is mostly on genetic variances. Although "environmental veriance" and "residual variance" may pretty much overlap, they are not the same. The "environmental veriance" is a genetic concept (or method for variance partitions), whereas the "residual variance" is a statistical concept (or method for variance partitions).

    It is not uncommon to see in publications that some only report "genetic + environment", and some others report "genetic + residual" variances. When they are curated into the CorrDB, we record they as they are (i.e. "residual" variance into a "residual" field and "environment" variance into a "environment" field. It will be up to users how these data will be looked at.

  9. What is genomic heritability? SNP-based heritability?

  10. Genomic heritability: the proportion of variance of a trait that can be explained (in the population) by a linear regression on a set of markers. Depending on the types of marker used, there can be SNP-based, Indel based, on methods there can be GCTA based heritability estimates. (GCTA - Genome-wide Complex Trait Analysis.)

    SNP-based heritability (or SNP) was initially defined as the proportion of phenotypic variance explained by all SNPs on a genotyping array and is therefore dependent of the number of SNPs on a SNP array, and later expanded to refer to the variance explained by any set of SNPs (Yang et al., 2017).

    One can estimate the relationships between individuals based on their genotypes and use a linear mixed model to estimate the variance explained by the genetic markers. This gives a genomic heritability estimate based on the variance captured by common genetic variants. Other types of estimates include using GCTA approch (GCTA), among others.
  11. Is there a way to locate a correlation information using the CorrID?

  12. Yes. In 2020 we introduced dbxref links to each QTL/association record in the CorrDB on Release 43. The syntax for the specific URL link is in the form of[CorrID], where CorrID is a numeric stable ID for each correlatin record in the CorrDB. This is often used by web tools, API tools, or database dbxref references. Refer to DBxREF list for syntax definition details on "Animal CorrDB".

First draft: January 9, 2018
Revised: December 17, 2020

Last update: February 01 2021 09:45:03.

By Zhiliang Hu
Associate Scientist
Dept of Animal Science
Iowa State University

Douglas S. Falconer, Trudy F.C. Mackay (1996), Introduction to Quantitative Genetics. Published by Pearson, Edinburgh Gate, Harlowm Essex CM20 2JE, England.

Wen Huang and Trudy F.C.Mackay (2016), "The Genetic Architecture of Quantitative Traits Cannot Be Inferred from Variance Component Analysis". PLoS Genet. 12(11).

Peter M. Visscher, William G. Hill and Naomi R. Wray, (2008), "Heritability in the genomics era — concepts and misconceptions". Nat Rev Genet. 9(4):255-66.

Jian Yang, Jian Zeng, Michael E Goddard, Naomi R Wray & Peter M Visscher (2017), "Concepts, estimation and interpretation of SNP-based heritability". Nature Genetics, 49:1304–1310.

John Stanton-Geddes, Jeremy B. Yoder, Roman Briskine, Nevin D. Young, and Peter Tiffin (2013), "Estimating heritability using genomic data". Methods in Ecology and Evolution, 4:1151–1158.

© 2003-2021 Creative Commons licenses by NAGRP - Bioinformatics Coordination Program.
Contact: NAGRP Bioinformatics Team