### Sample Discussion

Subjects:
- Inbreeding coefficents

 From erikscraggsgmail.com Thu May 30 18:46:07 2013 From: Erik Scraggs Postmaster: submission approved To: Multiple Recipients of Subject: Inbreeding coefficents Date: Thu, 30 May 2013 18:46:07 -0500 Dear all, I would kindly appreciate if somebody within the community could provide me with some assistance. I'm currently looking at estimating inbreeding coefficients within a population of cattle.To do this I have been using the inbreeding coefficients option in plink (--het option), which given a large number of SNPs, in a homogeneous sample, it is possible to calculate inbreeding coefficients (i.e. based on the observed versus expected number of homozygous genotypes). I've run the program and posted below is a snapshot of my results, this is where I would appreciate some clarity. Is it correct to assume, that where you see a negative value in the F column, that this indicates that there is no inbreeding and can therefore be set 0? FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 0 FB2892 24327 2.60E+04 37504 -0.1414 -- Erik Scraggs, PhD Department of Animal Sciences Washington State University Pullman, WA, 99164-4236, USA Tel: 509-288-2291  From gianolaansci.wisc.edu Thu May 30 20:06:15 2013 From: Daniel Gianola Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Thu, 30 May 2013 20:06:15 -0500 Since inbreeding coefficients cannot be negative, being probabilities, this indicates that PLINK (have no idea what it does) does not use a good estimation procedures. In the latter, estimates must fall inside the permissible parameter space. Regards, Daniel -----Original message----- .From: Erik Scraggs .To: Multiple Recipients of .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00 .Subject: Inbreeding coefficents Dear all, I would kindly appreciate if somebody within the community could provide me with some assistance. I'm currently looking at estimating inbreeding coefficients within a population of cattle.To do this I have been using the inbreeding coefficients option in plink (--het option), which given a large number of SNPs, in a homogeneous sample, it is possible to calculate inbreeding coefficients (i.e. based on the observed versus expected number of homozygous genotypes). I've run the program and posted below is a snapshot of my results, this is where I would appreciate some clarity. Is it correct to assume, that where you see a negative value in the F column, that this indicates that there is no inbreeding and can therefore be set 0? FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 0 FB2892 24327 2.60E+04 37504 -0.1414 -- Erik Scraggs, PhD Department of Animal Sciences Washington State University Pullman, WA, 99164-4236, USA Tel: 509-288-2291  From steibeljmsu.edu Thu May 30 21:39:42 2013 From: MSU_JPS Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Thu, 30 May 2013 21:39:42 -0500 Hi Erik In the plink documentation there are two notes that may be worth considering. I copy them below. But what Daniel said is more important: understand estimates and their properties before using them and be wary of results that are plain wrong. Note With whole genome data, it is probably best to apply this analysis to a subset that are pruned to be in approximate linkage equilibrium, say on the order of 50,000 autosomal SNPs. Use the --indep-pairwise and --indep commands to achieve this, described here. Note The estimate of F can sometimes be negative. Often this will just reflect random sampling error, but a result that is strongly negative (i.e. an individual has fewer homozygotes than one would expect by chance at the genome-wide level) can reflect other factors, e.g. sample contamination events perhaps. Sincerely, Juan P. Steibel On May 30, 2013, at 7:46 PM, Erik Scraggs wrote: > Dear all, > > I would kindly appreciate if somebody within the community could provide me > with some assistance. I'm currently looking at estimating inbreeding > coefficients within a population of cattle.To do this I have been using the > inbreeding coefficients option in plink (--het option), which given a > large number of SNPs, in a homogeneous sample, it is possible to calculate > inbreeding coefficients (i.e. based on the observed versus expected number > of homozygous genotypes). > > I've run the program and posted below is a snapshot of my results, this is > where I would appreciate some clarity. Is it correct to assume, that where > you see a negative value in the F column, that this indicates that there is > no inbreeding and can therefore be set 0? > > > FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 > FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 > 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 > 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 > 0 FB2892 24327 2.60E+04 37504 -0.1414 > > -- > Erik Scraggs, PhD > Department of Animal Sciences > Washington State University > Pullman, WA, 99164-4236, USA > Tel: 509-288-2291  From ytutsunomiyagmail.com Thu May 30 21:40:42 2013 From: Yuri Tani Utsunomiya Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Thu, 30 May 2013 21:40:42 -0500 Dear Erik, The inbreeding coefficient calculated by PLINK is equivalent to FIS in Wright's F-statistics [1]. In a structured population, the fixation index (represented by F = the degree of reduction in heterozygosity relative to Hardy-Weinberg expectation) can be partitioned into three levels: FIT - individual (I) relative to the total population (T); FIS - individual (I) relative to the subpopulation (S); and FST - subpopulation (S) relative to the total (T). Thus, Wright's FIS is often referred as the inbreeding coefficient, and a simplistic definition is FIS = 1 - (HI/HS), where HI represents the individual's heterozygosity, and HS the subpopulation's (or breed) heterozygosity. Looking at the definition proposed by Wright (1950) [1], F is better interpreted as a correlation measure between alleles in different 'partitions' of a structured population, rather than a probability. This means that it does assume negative values. If HI = HS, then FIS = 0, and the individual has the exactly expected heterozygosity level for the subpopulation. If HI < HS, then FIS > 0, and the individual is less heterozygous than expected given the subpopulation's heterozygosity. The closer to 1 FIS gets, the more inbred the individual is assumed to be. On the other hand, if HI > HS, then F < 0, so the individual is more heterozygous then expected given the subpopulation. Hence, negative values denote outbred individuals. I did not understand why you want to set the negative values to zero, as FIS is not a probability. In fact, you can test departure of individual heterozygosity from the expectation for the subpopulation by performing tests for goodness of fit if you want p-values... One worth note observation is: if the negative value is too small, then you may want to double check the outbred sample for the possibility of contamination (incidental mixing of two DNA sources, causing high sample heterozygosity). Although FIS has been largely used in genetic diversity studies using microsatellites to quantify inbreeding/diversity loss, you may want to have a look at inbreeding levels estimation by means of runs of homozygosity (ROH). While FIS largely relies on identity by state, empirical data suggests that ROH better captures information of identity by descent, and has been proposed as a suitable method to estimate autozygosity - some people would say that it should replace the pedigree estimates. PLINK also has an implementation for the algorithm [2]. For those who are not familiar with PLINK[3], I suggest checking it out. It is elegantly written in C/C++, and is a pionner software for the analysis of SNP data. It still remains one of the most complete toolsets available out there. Yours sincerely, Yuri [1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf [2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo [3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/ On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola wrote: > Since inbreeding coefficients cannot be negative, being probabilities, this > indicates that PLINK (have no idea what it does) does not use a good > estimation procedures. In the latter, estimates must fall inside the > permissible parameter space. > > Regards, > > Daniel > > > > -----Original message----- > .From: Erik Scraggs > .To: Multiple Recipients of > .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00 > .Subject: Inbreeding coefficents > > Dear all, > > I would kindly appreciate if somebody within the community could provide me > with some assistance. I'm currently looking at estimating inbreeding > coefficients within a population of cattle.To do this I have been using the > inbreeding coefficients option in plink (--het option), which given a > large number of SNPs, in a homogeneous sample, it is possible to calculate > inbreeding coefficients (i.e. based on the observed versus expected number > of homozygous genotypes). > > I've run the program and posted below is a snapshot of my results, this is > where I would appreciate some clarity. Is it correct to assume, that where > you see a negative value in the F column, that this indicates that there is > no inbreeding and can therefore be set 0? > > > FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 > FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 > 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 > -0.1179 > 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 > 0 FB2892 24327 2.60E+04 37504 -0.1414 > > -- > Erik Scraggs, PhD > Department of Animal Sciences > Washington State University > Pullman, WA, 99164-4236, USA > Tel: 509-288-2291 -- *Yuri T. Utsunomiya* MSc student at São Paulo State University (UNESP - Brazil) Laboratory of Animal Biochemistry and Molecular Biology - Araçatuba/SP Mobile: * +551881170036 * Skype me: yuri.tani * ?So I do dearly hope that the Genome Project does not give rise to some naive biological determinism that says we are nothing more than the sum of our genes. Geneticists don't believe that. Geneticists believe genes are an important part of the story. By understanding that part of the story, we're in a so-much better position to try to understand the rest of the story? - Prof. Eric Lander*  From bmuirpurdue.edu Thu May 30 21:44:57 2013 From: "Muir, William M." Postmaster: submission approved To: Multiple Recipients of Subject: FW: Inbreeding coefficents Date: Thu, 30 May 2013 21:44:57 -0500 Hi Eric, A negative inbreeding coefficient means that there is an excess heterozygosity as compared to expected. I am not familiar with the output of Plink, but if the excess occurs at most loci, it is a signature of demography, indicating that you are most likely looking at a recent out crossing event. However, in order to correctly determine inbreeding from genotyping data the proportion of loci NOT segregating is also important. Inbreeding drives loci to homozygosity, thus if you only considering those loci still segregating, you will observe an excess of heterozygosity compared to expected, as you did. The question then becomes how to determine which non informative loci to include in the calculation. In order to do this several breeds would have to be genotyped to determine the hypothetical ancestral population (HAP) allele frequencies. From the theory of drift, inbreeding does not change allele frequency across populations, but does within sub-populations, i.e. the rate of fixation and loss is directly proportional to the initial allele frequency in the HAP. Thus if a random set of subpopulations were sampled, the average allele frequency at that locus across those subpopulations, including those fixed and lost within some subpopulations, will estimate the HAP allele frequency (p) at that locus. From this allele frequency, the total expected heterozygosity (Ht) is determined as 2pq, and summed across loci. Next, to determine individual inbreeding, using those same loci in your population, including the ones fixed, determine Hi, heterozygosity of the individual, as the sum of the segregating loci over the total number of loci originally segregating in the HAP. The ratio of Hi/Ht=Hit is the amount of heterozygosity in the individual relative to the HAP. The total inbreeding coefficient of this individual is then Fx=1-Hit. Wright called this Fit, i.e. inbreeding of the individual relative to total. If the expected inbreeding were also be computed for each subpopulation based on the subpopulation allele frequency, this heterozygosity is Hs and Fst=1-Hs/Ht, which is the amount of drift that occurred between subpopulations. However, to complicate things even further, a SNP chip has ascertainment bias, meaning that only SNPs that were informative in certain breeds were put on the chip. This results in the same problem and has to be corrected for, Andy Clark has a paper on how to do this. Sequencing data is much better for determination of inbreeding coefficients as it does not have ascertainment bias. I am sure this is more than you wanted, but the bottom line is it is difficult to get a true estimate of inbreeding with knowledge of the entire drift process and correct sampling of the genetic material. A quick references would be from Hartl and Clark's book on population genetics, and they also reference the original works of Wright, Hill, and Weir (who also has a program to do this from genomic data, the book's title is Genetic Data Analysis, and the program is at http://www.eeb.uconn.edu/people/plewis /software.php). I also have a publication on the topic in chickens using a SNP chip which I can share with you if interested. Best Regards, Bill ----------------------------- William Muir, Ph.D. Professor Genetics Department of Animal Sciences Purdue University and Department of Medicine Indiana University Room G406 Lilly Hall 915 West State Street W. Lafayette, IN 47907 765-494-8032 https://ag.purdue.edu/...?strAlias=bmuir&intDirDeptID=8 http://medicine.iupui.edu/iarc/ -----Original Message----- .From: Erik Scraggs [mailto:erikscraggsgmail.com] .Sent: Thursday, May 30, 2013 7:46 PM .To: Multiple Recipients of .Subject: Inbreeding coefficents Dear all, I would kindly appreciate if somebody within the community could provide me with some assistance. I'm currently looking at estimating inbreeding coefficients within a population of cattle.To do this I have been using the inbreeding coefficients option in plink (--het option), which given a large number of SNPs, in a homogeneous sample, it is possible to calculate inbreeding coefficients (i.e. based on the observed versus expected number of homozygous genotypes). I've run the program and posted below is a snapshot of my results, this is where I would appreciate some clarity. Is it correct to assume, that where you see a negative value in the F column, that this indicates that there is no inbreeding and can therefore be set 0? FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 0 FB2892 24327 2.60E+04 37504 -0.1414 -- Erik Scraggs, PhD Department of Animal Sciences Washington State University Pullman, WA, 99164-4236, USA Tel: 509-288-2291  From Andres.Legarratoulouse.inra.fr Fri May 31 07:49:42 2013 From: Andres Legarra Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 07:49:42 -0500 Hi, inbreeding is excess of homozygotes respective to Hardy-Weinberg equilibrium (Falconer). Or, it is the correlation between uniting gametes (Wright). According to these definitions, it is NOT a probability and it can be therefore negative. However, if you use pedigree to estimate inbreeding, you are forced to assume that all founder alleles are different, and as a byproduct of this assumption, inbreeding is positive and is also a probability of identity by descent. When constructing genomic relationship matrices (VanRaden, 2008; Yang et al., 2010; etc) it is frequent to find negative values of inbreeding and also of relationships. These have to be interpreted as covariances and not like probabilities. Setting them to 0 creates havoc: you mess up the linear model and bias your results. Andres Le 31/05/2013 01:46, Erik Scraggs a ï¿½crit : > > Dear all, > > I would kindly appreciate if somebody within the community could provide me > with some assistance. I'm currently looking at estimating inbreeding > coefficients within a population of cattle.To do this I have been using the > inbreeding coefficients option in plink (--het option), which given a > large number of SNPs, in a homogeneous sample, it is possible to calculate > inbreeding coefficients (i.e. based on the observed versus expected number > of homozygous genotypes). > > I've run the program and posted below is a snapshot of my results, this is > where I would appreciate some clarity. Is it correct to assume, that where > you see a negative value in the F column, that this indicates that there is > no inbreeding and can therefore be set 0? > > > FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 > FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 > 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 > 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 > 0 FB2892 24327 2.60E+04 37504 -0.1414 > > -- > Erik Scraggs, PhD > Department of Animal Sciences > Washington State University > Pullman, WA, 99164-4236, USA > Tel: 509-288-2291 -- Andres Legarra +33 561285182 INRA, UR 631 SAGA, 24 Chemin de Borde Rouge - Auzeville CS 52627 31326 Castanet Tolosan, France http://genoweb.toulouse.inra.fr/~alegarra  From hsimiangwdg.de Fri May 31 08:31:11 2013 From: "Simianer, Henner" Postmaster: submission approved To: Multiple Recipients of Subject: AW: Inbreeding coefficents Date: Fri, 31 May 2013 08:31:11 -0500 Hi Andres, This is not the full story. The inbreeding coefficient F is also defined as probability that the two homologous alleles at a random locus in one individual are identical by descent (Malecot) and, being a probability, is bounded between 0 and 1, regardless of your assumptions on the base population. Thus, as often in quantitative genetics, the same thing has different definitions with different implications. Obviously estimates of F can be outside the interval (0,1) depending on the method you use. Best wishes Henner _____________________________________ Dr. Henner Simianer Professor of Animal Breeding and Genetics Department of Animal Sciences Georg-August-University Goettingen Albrecht-Thaer-Weg 3, 37075 Goettingen Tel.: +49-551-395604, Fax: +49-551-395587 Email: hsimiangwdg.de http://www.uni-goettingen.de/tierzucht -----Ursprï¿½ngliche Nachricht----- Von: Andres Legarra [mailto:Andres.Legarratoulouse.inra.fr] Gesendet: Freitag, 31. Mai 2013 14:50 An: Multiple Recipients of Betreff: Re: Inbreeding coefficents Hi, inbreeding is excess of homozygotes respective to Hardy-Weinberg equilibrium (Falconer). Or, it is the correlation between uniting gametes (Wright). According to these definitions, it is NOT a probability and it can be therefore negative. However, if you use pedigree to estimate inbreeding, you are forced to assume that all founder alleles are different, and as a byproduct of this assumption, inbreeding is positive and is also a probability of identity by descent. When constructing genomic relationship matrices (VanRaden, 2008; Yang et al., 2010; etc) it is frequent to find negative values of inbreeding and also of relationships. These have to be interpreted as covariances and not like probabilities. Setting them to 0 creates havoc: you mess up the linear model and bias your results. Andres  From taylorjerrmissouri.edu Fri May 31 08:34:10 2013 From: "Taylor, Jerry F. (Animal Science)" Subject: RE: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 08:34:10 -0500 Just a couple of other comments to add to the mix: 1. No matter how it is estimated/calculated/interpreted there is generally an assumption of random mating and selective neutrality (an absence of direct selection on genotype) associated with the locus/loci and these assumptions are generally violated in most populations due to drift and artificial selection. Thus, it is quite possible that you will observe a lower level of homozygosity within individuals than would be expected under these assumptions. 2. I am not sure how PLINK calculates the genomic relationship matrix, but if you have a read through PVR's great paper "Efficient methods to compute genomic predictions." J Dairy Sci. 2008 91(11):4414-23 you will see that the allele frequencies (AF) that are used to compute the GRM are for the base generation. Most programs that compute GRMs from genotype data simply compute AF at the locus using all animals and use this to construct the GRM and this is fine if the population is not subject to admixture, selection or drift. So: a) If your animals are crossbreds - you have a problem b) If your animals are stratified in time - you may have a problem I have found that the F coefficients for a population of 3570 registered Angus animals are quite sensitive to the AF estimates. If you estimate AF using all animals these differ from AF estimates estimated using the oldest 10% of animals and the effect on estimates of F is quite considerable. Jared Decker describes the very strong selection occurring genome-wide in these animals in his paper " A novel analytical method, Birth Date Selection Mapping, detects response of the Angus (Bos taurus) genome to selection on complex traits" BMC Genomics. 2012 13:606. He also examines the relationship between genomic F and pedigree F in this paper. Jerry -----Original Message----- .From: Erik Scraggs [mailto:erikscraggsgmail.com] .Sent: Thursday, May 30, 2013 6:46 PM .To: Multiple Recipients of .Subject: Inbreeding coefficents Dear all, I would kindly appreciate if somebody within the community could provide me with some assistance. I'm currently looking at estimating inbreeding coefficients within a population of cattle.To do this I have been using the inbreeding coefficients option in plink (--het option), which given a large number of SNPs, in a homogeneous sample, it is possible to calculate inbreeding coefficients (i.e. based on the observed versus expected number of homozygous genotypes). I've run the program and posted below is a snapshot of my results, this is where I would appreciate some clarity. Is it correct to assume, that where you see a negative value in the F column, that this indicates that there is no inbreeding and can therefore be set 0? FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 0 FB2892 24327 2.60E+04 37504 -0.1414 -- Erik Scraggs, PhD Department of Animal Sciences Washington State University Pullman, WA, 99164-4236, USA Tel: 509-288-2291  From bmuirpurdue.edu Fri May 31 08:34:17 2013 From: "Muir, William M." Postmaster: submission approved To: Multiple Recipients of Subject: FW: Inbreeding coefficents Date: Fri, 31 May 2013 08:34:17 -0500 Hi, Andres is of correct, and if calculating inbreeding coefficients using the genomic relationship approach (GRM), and scaled correctly, the inbreeding detected is that which has occurred within that breed or sub-population. Because inbreeding is cumulative, it can be broken down into that which has occurred prior to breed formation and that which has occurred after. If one uses a single breed starting at some time after breed formation, the inbreeding detected is current inbreeding, not total inbreeding. So the more important question is, what is the inbreeding coefficient being used for. If it is for within breed comparisons, i.e. genomic selection, then current inbreeding is appropriate. If one wants to know how much diversity has been lost as a result of population subdivision (breed formation) as well as current inbreeding, then one has to essentially do an across breed GRM. In the first definition given by Andres as deviation from HWE, the issue is what allele frequency to use as 'p' to calculate expected heterozygosity. If one uses p estimated within a breed, the inbreeding detected is local or current inbreeding. If p is estimated from the HAP, then the expected heterozygosity is that in the HAP and inbreeding detected is total. Bill -----Original Message----- .From: Andres Legarra [mailto:Andres.Legarratoulouse.inra.fr] .Sent: Friday, May 31, 2013 8:50 AM .To: Multiple Recipients of .Subject: Re: Inbreeding coefficents Hi, inbreeding is excess of homozygotes respective to Hardy-Weinberg equilibrium (Falconer). Or, it is the correlation between uniting gametes (Wright). According to these definitions, it is NOT a probability and it can be therefore negative. However, if you use pedigree to estimate inbreeding, you are forced to assume that all founder alleles are different, and as a byproduct of this assumption, inbreeding is positive and is also a probability of identity by descent. When constructing genomic relationship matrices (VanRaden, 2008; Yang et al., 2010; etc) it is frequent to find negative values of inbreeding and also of relationships. These have to be interpreted as covariances and not like probabilities. Setting them to 0 creates havoc: you mess up the linear model and bias your results. Andres Le 31/05/2013 01:46, Erik Scraggs a ï¿½crit : > Dear all, > > I would kindly appreciate if somebody within the community could > provide me with some assistance. I'm currently looking at estimating > inbreeding coefficients within a population of cattle.To do this I > have been using the inbreeding coefficients option in plink (--het > option), which given a large number of SNPs, in a homogeneous sample, > it is possible to calculate inbreeding coefficients (i.e. based on the > observed versus expected number of homozygous genotypes). > > I've run the program and posted below is a snapshot of my results, > this is where I would appreciate some clarity. Is it correct to > assume, that where you see a negative value in the F column, that this > indicates that there is no inbreeding and can therefore be set 0? > > > FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 > FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 > 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 > 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 > 0 FB2892 24327 2.60E+04 37504 -0.1414 > > -- > Erik Scraggs, PhD > Department of Animal Sciences > Washington State University > Pullman, WA, 99164-4236, USA > Tel: 509-288-2291 -- Andres Legarra +33 561285182 INRA, UR 631 SAGA, 24 Chemin de Borde Rouge - Auzeville CS 52627 31326 Castanet Tolosan, France http://genoweb.toulouse.inra.fr/~alegarra  From gianolaansci.wisc.edu Fri May 31 08:56:13 2013 From: Daniel Gianola Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 08:56:13 -0500 Points well taken, as I was assuming that this was based on pedigrees. We could certainly used negatively inbred individuals in some animal populations,eg, dogs. It would be useful to revisit Cockerham (1969, 1973) where he revisits Wright's indexes in terms of variance components, and these cannot be negative except when silly unbiased estimators are used. Please take my comments in the light of my ignorance about PLINK. Bill Muir's remarks are very useful as well. Regards, Dan -----Original message----- .From: Yuri Tani Utsunomiya .To: Multiple Recipients of .Sent: Fri, May 31, 2013 02:42:10 GMT+00:00 .Subject: Re: Inbreeding coefficents Dear Erik, The inbreeding coefficient calculated by PLINK is equivalent to FIS in Wright's F-statistics [1]. In a structured population, the fixation index (represented by F = the degree of reduction in heterozygosity relative to Hardy-Weinberg expectation) can be partitioned into three levels: FIT - individual (I) relative to the total population (T); FIS - individual (I) relative to the subpopulation (S); and FST - subpopulation (S) relative to the total (T). Thus, Wright's FIS is often referred as the inbreeding coefficient, and a simplistic definition is FIS = 1 - (HI/HS), where HI represents the individual's heterozygosity, and HS the subpopulation's (or breed) heterozygosity. Looking at the definition proposed by Wright (1950) [1], F is better interpreted as a correlation measure between alleles in different 'partitions' of a structured population, rather than a probability. This means that it does assume negative values. If HI = HS, then FIS = 0, and the individual has the exactly expected heterozygosity level for the subpopulation. If HI < HS, then FIS > 0, and the individual is less heterozygous than expected given the subpopulation's heterozygosity. The closer to 1 FIS gets, the more inbred the individual is assumed to be. On the other hand, if HI > HS, then F < 0, so the individual is more heterozygous then expected given the subpopulation. Hence, negative values denote outbred individuals. I did not understand why you want to set the negative values to zero, as FIS is not a probability. In fact, you can test departure of individual heterozygosity from the expectation for the subpopulation by performing tests for goodness of fit if you want p-values... One worth note observation is: if the negative value is too small, then you may want to double check the outbred sample for the possibility of contamination (incidental mixing of two DNA sources, causing high sample heterozygosity). Although FIS has been largely used in genetic diversity studies using microsatellites to quantify inbreeding/diversity loss, you may want to have a look at inbreeding levels estimation by means of runs of homozygosity (ROH). While FIS largely relies on identity by state, empirical data suggests that ROH better captures information of identity by descent, and has been proposed as a suitable method to estimate autozygosity - some people would say that it should replace the pedigree estimates. PLINK also has an implementation for the algorithm [2]. For those who are not familiar with PLINK[3], I suggest checking it out. It is elegantly written in C/C++, and is a pionner software for the analysis of SNP data. It still remains one of the most complete toolsets available out there. Yours sincerely, Yuri [1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf [2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo [3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/ On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola wrote: > Since inbreeding coefficients cannot be negative, being probabilities, this > indicates that PLINK (have no idea what it does) does not use a good > estimation procedures. In the latter, estimates must fall inside the > permissible parameter space. > > Regards, > > Daniel > > > -----Original message----- > .From: Erik Scraggs > .To: Multiple Recipients of > .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00 > .Subject: Inbreeding coefficents > > Dear all, > > I would kindly appreciate if somebody within the community could provide me > with some assistance. I'm currently looking at estimating inbreeding > coefficients within a population of cattle.To do this I have been using the > inbreeding coefficients option in plink (--het option), which given a > large number of SNPs, in a homogeneous sample, it is possible to calculate > inbreeding coefficients (i.e. based on the observed versus expected number > of homozygous genotypes). > > I've run the program and posted below is a snapshot of my results, this is > where I would appreciate some clarity. Is it correct to assume, that where > you see a negative value in the F column, that this indicates that there is > no inbreeding and can therefore be set 0? > > > FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 > FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 > 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 > -0.1179 > 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 > 0 FB2892 24327 2.60E+04 37504 -0.1414 > > -- > Erik Scraggs, PhD > Department of Animal Sciences > Washington State University > Pullman, WA, 99164-4236, USA > Tel: 509-288-2291 -- *Yuri T. Utsunomiya* MSc student at São Paulo State University (UNESP - Brazil) Laboratory of Animal Biochemistry and Molecular Biology - Araçatuba/SP Mobile: * +551881170036 * Skype me: yuri.tani * ?So I do dearly hope that the Genome Project does not give rise to some naive biological determinism that says we are nothing more than the sum of our genes. Geneticists don't believe that. Geneticists believe genes are an important part of the story. By understanding that part of the story, we're in a so-much better position to try to understand the rest of the story? - Prof. Eric Lander*  From ytutsunomiyagmail.com Fri May 31 08:59:13 2013 From: Yuri Tani Utsunomiya Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 08:59:13 -0500 Thanks Bill and Andres for putting it down in clearer words. I concur with Daniel: Weir and Cockerham's [1] contribution to redefine F-statistics in a variance-components framework opened a window to a new series of estimators for genetic diversity and differentiation in geographically structured populations. I recommend reading [2] for a nice review on the subject. Although [3] is a review focusing on FST, I find it a very pleasant material to be read by anybody that wants to study F-statistics. Now, if your focus, Erik, is on individual autozygosity and potentially inbreeding depression, rather than population diversity, you may want to go beyond F-statistics and do ROH analysis. Best, Yuri [1]http://www.jstor.org/stable/2408641 [2]http://www.annualreviews.org/...annurev.genet.36.050802.093940 [3]http://www.nature.com/...journal/v10/n9/pdf/nrg2611.pdf On Fri, May 31, 2013 at 9:39 AM, Daniel Gianola wrote: > Points well taken, as I was assuming that this was based on pedigrees. We > could certainly used negatively inbred individuals in some animal > populations,eg, dogs. > > It would be useful to revisit Cockerham (1969, 1973) where he revisits > Wright's indexes in terms of variance components, and these cannot be > negative except when silly unbiased estimators are used. > > Please take my comments in the light of my ignorance about PLINK. > > Bill Muir's remarks are very useful as well. > > Regards, > > Dan > > > *Connected by DROID on Verizon Wireless* > > > -----Original message----- > > *From: *Yuri Tani Utsunomiya * > To: *Multiple Recipients of * > Sent: *Fri, May 31, 2013 02:42:10 GMT+00:00* > Subject: *Re: Inbreeding coefficents > > Dear Erik, > > The inbreeding coefficient calculated by PLINK is equivalent to FIS in > Wright's F-statistics [1]. > > In a structured population, the fixation index (represented by F = the > degree of reduction in heterozygosity relative to Hardy-Weinberg > expectation) can be partitioned into three levels: FIT - individual (I) > relative to the total population (T); FIS - individual (I) relative to the > subpopulation (S); and FST - subpopulation (S) relative to the total (T). > Thus, Wright's FIS is often referred as the inbreeding coefficient, and a > simplistic definition is FIS = 1 - (HI/HS), where HI represents the > individual's heterozygosity, and HS the subpopulation's (or breed) > heterozygosity. > > Looking at the definition proposed by Wright (1950) [1], F is better > interpreted as a correlation measure between alleles in different > 'partitions' of a structured population, rather than a probability. This > means that it does assume negative values. If HI = HS, then FIS = 0, and > the individual has the exactly expected heterozygosity level for the > subpopulation. If HI < HS, then FIS > 0, and the individual is less > heterozygous than expected given the subpopulation's heterozygosity. The > closer to 1 FIS gets, the more inbred the individual is assumed to be. On > the other hand, if HI > HS, then F < 0, so the individual is more > heterozygous then expected given the subpopulation. Hence, negative values > denote outbred individuals. > > I did not understand why you want to set the negative values to zero, as > FIS is not a probability. In fact, you can test departure of individual > heterozygosity from the expectation for the subpopulation by performing > tests for goodness of fit if you want p-values... > > One worth note observation is: if the negative value is too small, then you > may want to double check the outbred sample for the possibility of > contamination (incidental mixing of two DNA sources, causing high sample > heterozygosity). Although FIS has been largely used in genetic diversity > studies using microsatellites to quantify inbreeding/diversity loss, you > may want to have a look at inbreeding levels estimation by means of runs of > homozygosity (ROH). While FIS largely relies on identity by state, > empirical data suggests that ROH better captures information of identity by > descent, and has been proposed as a suitable method to estimate > autozygosity - some people would say that it should replace the pedigree > estimates. PLINK also has an implementation for the algorithm [2]. > > For those who are not familiar with PLINK[3], I suggest checking it out. It > is elegantly written in C/C++, and is a pionner software for the analysis > of SNP data. It still remains one of the most complete toolsets available > out there. > > Yours sincerely, > > Yuri > > > [1] > http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf > [2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo > [3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/ > > On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola >wrote: > >> Since inbreeding coefficients cannot be negative, being probabilities, this >> indicates that PLINK (have no idea what it does) does not use a good >> estimation procedures. In the latter, estimates must fall inside the >> permissible parameter space. >> >> Regards, >> >> Daniel >> >> >> >> -----Original message----- >> .From: Erik Scraggs >> .To: Multiple Recipients of >> .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00 >> .Subject: Inbreeding coefficents >> >> Dear all, >> >> I would kindly appreciate if somebody within the community could provide me >> with some assistance. I'm currently looking at estimating inbreeding >> coefficients within a population of cattle.To do this I have been using the >> inbreeding coefficients option in plink (--het option), which given a >> large number of SNPs, in a homogeneous sample, it is possible to calculate >> inbreeding coefficients (i.e. based on the observed versus expected number >> of homozygous genotypes). >> >> I've run the program and posted below is a snapshot of my results, this is >> where I would appreciate some clarity. Is it correct to assume, that where >> you see a negative value in the F column, that this indicates that there is >> no inbreeding and can therefore be set 0? >> >> >> FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 >> FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 >> 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 >> -0.1179 >> 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 >> 0 FB2892 24327 2.60E+04 37504 -0.1414 >> >> -- >> Erik Scraggs, PhD >> Department of Animal Sciences >> Washington State University >> Pullman, WA, 99164-4236, USA >> Tel: 509-288-2291 > > > -- > *Yuri T. Utsunomiya* > > MSc student at São Paulo State University (UNESP - Brazil) > Laboratory of Animal Biochemistry and Molecular Biology - Araçatuba/SP > Mobile: * +551881170036 * Skype me: yuri.tani *  From ydaumn.edu Fri May 31 09:03:38 2013 From: Yang Da Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 09:03:38 -0500 I think the main point is the distinction between pedigree inbreeding coefficient, which is a function of IBD prabability and non-negative, and genomic inbreeding coefficient, which could be negative due to the main reasons given by other posts. Yang Da, Ph. D. Department of Animal Science University of Minnesota On Fri, May 31, 2013 at 8:31 AM, Simianer, Henner wrote: > Hi Andres, > > This is not the full story. > > The inbreeding coefficient F is also defined as probability that the two > homologous alleles at a > random locus in one individual are identical by descent (Malecot) and, > being a probability, is > bounded between 0 and 1, regardless of your assumptions on the base > population. Thus, as often > in quantitative genetics, the same thing has different definitions with > different implications. > Obviously estimates of F can be outside the interval (0,1) depending on > the method you use. > > Best wishes > > Henner > > > _____________________________________ > Dr. Henner Simianer > Professor of Animal Breeding and Genetics > Department of Animal Sciences > Georg-August-University Goettingen > Albrecht-Thaer-Weg 3, 37075 Goettingen > Tel.: +49-551-395604, Fax: +49-551-395587 > Email: hsimiangwdg.de > http://www.uni-goettingen.de/tierzucht > > > -----Ursprÿÿngliche Nachricht----- > Von: Andres Legarra [mailto:Andres.Legarratoulouse.inra.fr] > Gesendet: Freitag, 31. Mai 2013 14:50 > An: Multiple Recipients of > Betreff: Re: Inbreeding coefficents > > Hi, > > inbreeding is excess of homozygotes respective to Hardy-Weinberg > equilibrium (Falconer). Or, it is > the correlation between uniting gametes (Wright). According to these > definitions, it is NOT a > probability and it can be therefore negative. > > However, if you use pedigree to estimate inbreeding, you are forced to > assume that all founder > alleles are different, and as a byproduct of this assumption, inbreeding > is positive and is also a > probability of identity by descent. > > When constructing genomic relationship matrices (VanRaden, 2008; Yang et > al., 2010; etc) it is > frequent to find negative values of inbreeding and also of relationships. > These have to be > interpreted as covariances and not like probabilities. Setting them to 0 > creates havoc: you mess up > the linear model and bias your results. > > Andres  From ytutsunomiyagmail.com Fri May 31 09:50:29 2013 From: Yuri Tani Utsunomiya Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 09:50:29 -0500 Just to foster the discussion with some extra useful info, I believe PLINK calculates F as: [notice this is a Latex equation - to see how it looks like you can copy and paste it here: http://www.codecogs.com/latex/eqneditor.php] F_{i} = \frac{(O_{i} - E_{i})}{(L_{i} - E_{i})} where O_{i} is observed homozygosity, L_{i} is the number of SNPs measured in individual i and E_{i} = \sum\limits_{j=1}^{L_{i}} \left(1-2p_{j}(1-p_{j})\frac{n_{j}}{1-n_{j}}\right) where nj and pj are the number of measured genotypes and the reference allele frequency at locus j, respectively. I may be wrong, but PLINK implements F as a descriptive statistic that can be used by the user for three main purposes: 1) identify contaminated samples; 2) identify excess of X chromosome heterozygosity in samples declared to be male; 3) as the Wright inbreeding coefficient. Other usage must be carefully assessed and interpreted. As said before in the discussion, 'inbreeding coefficients' have a handful of different definitions and contexts, but this in particular is a variant of the FIS (IBS-based) measure defined by Wright in the analysis of structured populations. Yuri On Fri, May 31, 2013 at 11:07 AM, Baumung, Roswitha (AGAG) wrote: > Dear colleagues, > > You might find the following publication interesting: Inbreeding: one word, > several meanings, much confusion. Templeton AR, Read B. Source Department of > Biology, Washington University, St. Louis, MO 63130. > > Abstract > > Because conservation biologists must frequently deal with small populations, > inbreeding (a frequent consequence of small population size) has played a > central role in many genetic management programs. However, the word > "inbreeding" has several, often contradictory meanings, and a failure to > distinguish among these meanings has caused much misunderstanding on the role > of inbreeding in genetic management. Three different biological meanings of > inbreeding are discussed in this paper: (1) inbreeding as a measure of shared > ancestry in the paternal and maternal lineages of an individual; (2) > inbreeding as a measure of genetic drift in a finite population, and (3) > inbreeding as a measure of system of mating in a reproducing population. The > distinction and use of these different measures of inbreeding are discussed > and illustrated with a worked example, the North American captive population > of Speke's gazelle (Gazella spekei). It is shown that these different meanings > of the word inbreeding must be kept separated, otherwise erroneous management > recommendations and evaluations can occur. On the positive side, the different > measures of inbreeding when used jointly can be a powerful management tool > precisely because they measure different biological phenomena. > > Kind regards, Roswitha > > > Roswitha Baumung > Animal Production Officer > Animal Genetic Resources Branch > Animal Production and Health Division > FAO - Food and Agriculture Organization of the United Nations > Viale delle Terme di Caracalla > 00153 Rome Italy > Tel. +39 06 57052158 > > > -----Original Message----- > .From: Simianer, Henner [mailto:hsimiangwdg.de] > .Sent: 31 May 2013 15:31 > .To: Multiple Recipients of > .Subject: AW: Inbreeding coefficents > > Hi Andres, > > This is not the full story. > > The inbreeding coefficient F is also defined as probability that the two > homologous alleles at a random locus in one individual are identical by > descent (Malecot) and, being a probability, is bounded between 0 and 1, > regardless of your assumptions on the base population. Thus, as often in > quantitative genetics, the same thing has different definitions with > different > implications. Obviously estimates of F can be outside the interval (0,1) > depending on the method you use. > > Best wishes > > Henner > > > _____________________________________ > Dr. Henner Simianer > Professor of Animal Breeding and Genetics > Department of Animal Sciences > Georg-August-University > Goettingen Albrecht-Thaer-Weg 3, 37075 Goettingen > Tel.: +49-551-395604, Fax: +49-551-395587 > Email: hsimiangwdg.de > http://www.uni-goettingen.de/tierzucht  From erikscraggsgmail.com Fri May 31 10:04:48 2013 From: Erik Scraggs Subject: Re: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 10:04:48 -0500 Dear all, I greatly appreciate your help, thank you for taking the time to provide me such detailed explanations. It is great to have the help of the community. Many thanks Erik On Fri, May 31, 2013 at 6:34 AM, Taylor, Jerry F. (Animal Science) < taylorjerrmissouri.edu> wrote: > Just a couple of other comments to add to the mix: > > 1. No matter how it is estimated/calculated/interpreted there is generally > an assumption of random mating and selective neutrality (an absence of direct > selection on genotype) associated with the locus/loci and these assumptions > are generally violated in most populations due to drift and artificial > selection. Thus, it is quite possible that you will observe a lower level of > homozygosity within individuals than would be expected under these > assumptions. > > 2. I am not sure how PLINK calculates the genomic relationship matrix, but > if you have a read through PVR's great paper "Efficient methods to compute > genomic predictions." J Dairy Sci. 2008 91(11):4414-23 you will see that the > allele frequencies (AF) that are used to compute the GRM are for the base > generation. Most programs that compute GRMs from genotype data simply compute > AF at the locus using all animals and use this to construct the GRM and this > is fine if the population is not subject to admixture, selection or drift. > So: > a) If your animals are crossbreds - you have a problem > b) If your animals are stratified in time - you may have a problem > > I have found that the F coefficients for a population of 3570 registered > Angus animals are quite sensitive to the AF estimates. If you estimate AF > using all animals these differ from AF estimates estimated using the oldest > 10% of animals and the effect on estimates of F is quite considerable. > > Jared Decker describes the very strong selection occurring genome-wide in > these animals in his paper " A novel analytical method, Birth Date Selection > Mapping, detects response of the Angus (Bos taurus) genome to selection on > complex traits" BMC Genomics. 2012 13:606. He also examines the relationship > between genomic F and pedigree F in this paper. > > Jerry > > > -----Original Message----- > .From: Erik Scraggs [mailto:erikscraggsgmail.com] > .Sent: Thursday, May 30, 2013 6:46 PM > .To: Multiple Recipients of > .Subject: Inbreeding coefficents > > Dear all, > > I would kindly appreciate if somebody within the community could provide me > with some assistance. I'm currently looking at estimating inbreeding > coefficients within a population of cattle.To do this I have been using the > inbreeding coefficients option in plink (--het option), which given a > large number of SNPs, in a homogeneous sample, it is possible to calculate > inbreeding coefficients (i.e. based on the observed versus expected number > of homozygous genotypes). > > I've run the program and posted below is a snapshot of my results, this is > where I would appreciate some clarity. Is it correct to assume, that where > you see a negative value in the F column, that this indicates that there is > no inbreeding and can therefore be set 0? > > > FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 > FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 0.1517 > 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 -0.1179 > 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 0.1194 > 0 FB2892 24327 2.60E+04 37504 -0.1414 -- Erik Scraggs, PhD Department of Animal Sciences Washington State University Pullman, WA, 99164-4236, USA Tel: 509-288-2291  From Roger.VallejoARS.USDA.GOV Fri May 31 10:22:49 2013 From: "Vallejo, Roger" Subject: RE: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 10:22:49 -0500 I think the basic questions are not being answered. Are the Wright's F-statistics correlations or probabilities? Then, you can decide on how to treat these F-statistics. Let me add this. The F-statistic model is a hierarchical model with genes stratified at three levels: Individuals (I), within subdivisions (S) and within the total population (T). It has three main parameters: FIT is the correlation of uniting gametes relative to those of the total population; FIS is the average over all subdivisions of the correlation of uniting gametes relative to the gametes of the subdivision; and FST is the correlation of random gametes within subdivisions relative to the total population. The three F-statistics are interrelated as (1 - FIT) = (1 - FST) (1 - FIS). A variety of derivations of this basic relationship are available (Wright 1951, 1965; Cockerham 1969). It is clear from WRIGHT's formulation of the F-statistic model that the parameters FIS and FIT are free to take either positive or negative values depending on whether there is a deficit or excess of heterozygotes; it is also clear from WRIGHT's work that the parameter FST is necessarily positive (JC Long, Genetics 1986). I hope this helps some on this very interesting issue. Roger Roger L. Vallejo, Ph.D. U.S. Department of Agriculture, ARS, NCCCWA Voice: (304) 724-8340 Ext. 2141 Email: roger.vallejoars.usda.gov http://www.ars.usda.gov/...ople/people.htm?personid=37662 -----Original Message----- .From: Daniel Gianola [mailto:gianolaansci.wisc.edu] .Sent: Friday, May 31, 2013 9:56 AM .To: Multiple Recipients of .Subject: Re: Inbreeding coefficents Points well taken, as I was assuming that this was based on pedigrees. We could certainly used negatively inbred individuals in some animal populations,eg, dogs. It would be useful to revisit Cockerham (1969, 1973) where he revisits Wright's indexes in terms of variance components, and these cannot be negative except when silly unbiased estimators are used. Please take my comments in the light of my ignorance about PLINK. Bill Muir's remarks are very useful as well. Regards, Dan -----Original message----- .From: Yuri Tani Utsunomiya .To: Multiple Recipients of .Sent: Fri, May 31, 2013 02:42:10 GMT+00:00 .Subject: Re: Inbreeding coefficents Dear Erik, The inbreeding coefficient calculated by PLINK is equivalent to FIS in Wright's F-statistics [1]. In a structured population, the fixation index (represented by F = the degree of reduction in heterozygosity relative to Hardy-Weinberg expectation) can be partitioned into three levels: FIT - individual (I) relative to the total population (T); FIS - individual (I) relative to the subpopulation (S); and FST - subpopulation (S) relative to the total (T). Thus, Wright's FIS is often referred as the inbreeding coefficient, and a simplistic definition is FIS = 1 - (HI/HS), where HI represents the individual's heterozygosity, and HS the subpopulation's (or breed) heterozygosity. Looking at the definition proposed by Wright (1950) [1], F is better interpreted as a correlation measure between alleles in different 'partitions' of a structured population, rather than a probability. This means that it does assume negative values. If HI = HS, then FIS = 0, and the individual has the exactly expected heterozygosity level for the subpopulation. If HI < HS, then FIS > 0, and the individual is less heterozygous than expected given the subpopulation's heterozygosity. The closer to 1 FIS gets, the more inbred the individual is assumed to be. On the other hand, if HI > HS, then F < 0, so the individual is more heterozygous then expected given the subpopulation. Hence, negative values denote outbred individuals. I did not understand why you want to set the negative values to zero, as FIS is not a probability. In fact, you can test departure of individual heterozygosity from the expectation for the subpopulation by performing tests for goodness of fit if you want p-values... One worth note observation is: if the negative value is too small, then you may want to double check the outbred sample for the possibility of contamination (incidental mixing of two DNA sources, causing high sample heterozygosity). Although FIS has been largely used in genetic diversity studies using microsatellites to quantify inbreeding/diversity loss, you may want to have a look at inbreeding levels estimation by means of runs of homozygosity (ROH). While FIS largely relies on identity by state, empirical data suggests that ROH better captures information of identity by descent, and has been proposed as a suitable method to estimate autozygosity - some people would say that it should replace the pedigree estimates. PLINK also has an implementation for the algorithm [2]. For those who are not familiar with PLINK[3], I suggest checking it out. It is elegantly written in C/C++, and is a pionner software for the analysis of SNP data. It still remains one of the most complete toolsets available out there. Yours sincerely, Yuri [1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf [2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo [3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/ On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola wrote: > Since inbreeding coefficients cannot be negative, being probabilities, > this indicates that PLINK (have no idea what it does) does not use a > good estimation procedures. In the latter, estimates must fall inside > the permissible parameter space. > > Regards, > > Daniel > > > -----Original message----- > .From: Erik Scraggs > .To: Multiple Recipients of > .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00 > .Subject: Inbreeding coefficents > > Dear all, > > I would kindly appreciate if somebody within the community could > provide me with some assistance. I'm currently looking at estimating > inbreeding coefficients within a population of cattle.To do this I > have been using the inbreeding coefficients option in plink (--het > option), which given a large number of SNPs, in a homogeneous sample, > it is possible to calculate inbreeding coefficients (i.e. based on the > observed versus expected number of homozygous genotypes). > > I've run the program and posted below is a snapshot of my results, > this is where I would appreciate some clarity. Is it correct to > assume, that where you see a negative value in the F column, that this > indicates that there is no inbreeding and can therefore be set 0? > > > FID IID O(HOM) E(HOM) N(NM) F 0 FB686 28313 2.58E+04 37177 0.2243 0 > FB1615 25773 2.60E+04 37510 -0.01666 0 FB2101 27566 2.58E+04 37287 > 0.1517 > 0 FB2126 23992 2.60E+04 37494 -0.1699 0 FB2127 24348 2.57E+04 37125 > -0.1179 > 0 FB2422 24469 2.60E+04 37497 -0.1287 0 FB2501 27209 2.58E+04 37317 > 0.1194 > 0 FB2892 24327 2.60E+04 37504 -0.1414 > > -- > Erik Scraggs, PhD > Department of Animal Sciences > Washington State University > Pullman, WA, 99164-4236, USA > Tel: 509-288-2291 -- *Yuri T. Utsunomiya* MSc student at Sï¿½o Paulo State University (UNESP - Brazil) Laboratory of Animal Biochemistry and Molecular Biology - Araï¿½atuba/SP Mobile: * +551881170036 * Skype me: yuri.tani * ?So I do dearly hope that the Genome Project does not give rise to some naive biological determinism that says we are nothing more than the sum of our genes. Geneticists don't believe that. Geneticists believe genes are an important part of the story. By understanding that part of the story, we're in a so-much better position to try to understand the rest of the story? - Prof. Eric Lander*  From gianolaansci.wisc.edu Fri May 31 10:50:17 2013 From: Daniel Gianola Subject: RE: Inbreeding coefficents Postmaster: submission approved To: Multiple Recipients of Date: Fri, 31 May 2013 10:50:17 -0500 Roger: Well put. Daniel -----Original message----- .From: "Vallejo, Roger" .To: Multiple Recipients of .Sent: Fri, May 31, 2013 15:24:19 GMT+00:00 .Subject: RE: Inbreeding coefficents I think the basic questions are not being answered. Are the Wright's F-statistics correlations or probabilities? Then, you can decide on how to treat these F-statistics. Let me add this. The F-statistic model is a hierarchical model with genes stratified at three levels: Individuals (I), within subdivisions (S) and within the total population (T). It has three main parameters: FIT is the correlation of uniting gametes relative to those of the total population; FIS is the average over all subdivisions of the correlation of uniting gametes relative to the gametes of the subdivision; and FST is the correlation of random gametes within subdivisions relative to the total population. The three F-statistics are interrelated as (1 - FIT) = (1 - FST) (1 - FIS). A variety of derivations of this basic relationship are available (Wright 1951, 1965; Cockerham 1969). It is clear from WRIGHT's formulation of the F-statistic model that the parameters FIS and FIT are free to take either positive or negative values depending on whether there is a deficit or excess of heterozygotes; it is also clear from WRIGHT's work that the parameter FST is necessarily positive (JC Long, Genetics 1986). I hope this helps some on this very interesting issue. Roger Roger L. Vallejo, Ph.D. U.S. Department of Agriculture, ARS, NCCCWA Voice: (304) 724-8340 Ext. 2141 Email: roger.vallejoars.usda.gov http://www.ars.usda.gov/...ople/people.htm?personid=37662 -----Original Message----- .From: Daniel Gianola [mailto:gianolaansci.wisc.edu] .Sent: Friday, May 31, 2013 9:56 AM .To: Multiple Recipients of .Subject: Re: Inbreeding coefficents Points well taken, as I was assuming that this was based on pedigrees. We could certainly used negatively inbred individuals in some animal populations,eg, dogs. It would be useful to revisit Cockerham (1969, 1973) where he revisits Wright's indexes in terms of variance components, and these cannot be negative except when silly unbiased estimators are used. Please take my comments in the light of my ignorance about PLINK. Bill Muir's remarks are very useful as well. Regards, Dan -----Original message----- .From: Yuri Tani Utsunomiya .To: Multiple Recipients of .Sent: Fri, May 31, 2013 02:42:10 GMT+00:00 .Subject: Re: Inbreeding coefficents Dear Erik, The inbreeding coefficient calculated by PLINK is equivalent to FIS in Wright's F-statistics [1]. In a structured population, the fixation index (represented by F = the degree of reduction in heterozygosity relative to Hardy-Weinberg expectation) can be partitioned into three levels: FIT - individual (I) relative to the total population (T); FIS - individual (I) relative to the subpopulation (S); and FST - subpopulation (S) relative to the total (T). Thus, Wright's FIS is often referred as the inbreeding coefficient, and a simplistic definition is FIS = 1 - (HI/HS), where HI represents the individual's heterozygosity, and HS the subpopulation's (or breed) heterozygosity. Looking at the definition proposed by Wright (1950) [1], F is better interpreted as a correlation measure between alleles in different 'partitions' of a structured population, rather than a probability. This means that it does assume negative values. If HI = HS, then FIS = 0, and the individual has the exactly expected heterozygosity level for the subpopulation. If HI < HS, then FIS > 0, and the individual is less heterozygous than expected given the subpopulation's heterozygosity. The closer to 1 FIS gets, the more inbred the individual is assumed to be. On the other hand, if HI > HS, then F < 0, so the individual is more heterozygous then expected given the subpopulation. Hence, negative values denote outbred individuals. I did not understand why you want to set the negative values to zero, as FIS is not a probability. In fact, you can test departure of individual heterozygosity from the expectation for the subpopulation by performing tests for goodness of fit if you want p-values... One worth note observation is: if the negative value is too small, then you may want to double check the outbred sample for the possibility of contamination (incidental mixing of two DNA sources, causing high sample heterozygosity). Although FIS has been largely used in genetic diversity studies using microsatellites to quantify inbreeding/diversity loss, you may want to have a look at inbreeding levels estimation by means of runs of homozygosity (ROH). While FIS largely relies on identity by state, empirical data suggests that ROH better captures information of identity by descent, and has been proposed as a suitable method to estimate autozygosity - some people would say that it should replace the pedigree estimates. PLINK also has an implementation for the algorithm [2]. For those who are not familiar with PLINK[3], I suggest checking it out. It is elegantly written in C/C++, and is a pionner software for the analysis of SNP data. It still remains one of the most complete toolsets available out there. Yours sincerely, Yuri [1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf [2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo [3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/ 

Go back to the AnGenMap main page.

 © 2003-2021 Creative Commons licenses by NAGRP - Bioinformatics Coordination Program. Contact: NAGRP Bioinformatics Team Helpdesk