AnGenMap

Sample Discussion

Subjects:
- Inbreeding coefficents

From erikscraggsgmail.com  Thu May 30 18:46:07 2013
From: Erik Scraggs <erikscraggsgmail.com>
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Subject: Inbreeding coefficents
Date: Thu, 30 May 2013 18:46:07 -0500

Dear all,

I would kindly appreciate if somebody within the community could provide me
with some assistance. I'm currently looking at estimating inbreeding
coefficients within a population of cattle.To do this I have been using the
inbreeding coefficients option in plink (--het option), which  given a
large number of SNPs, in a homogeneous sample, it is possible to calculate
inbreeding coefficients (i.e. based on the observed versus expected number
of homozygous genotypes).

I've run the program and posted below is a snapshot of my results, this is
where I would appreciate some clarity. Is it correct to assume, that where
you see a negative value in the F column, that this indicates that there is
no inbreeding and can therefore be set 0?


FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125 -0.1179
0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
0 FB2892 24327 2.60E+04 37504 -0.1414 

--
Erik Scraggs, PhD
Department of Animal Sciences
Washington State University
Pullman, WA, 99164-4236, USA
Tel: 509-288-2291

From gianolaansci.wisc.edu  Thu May 30 20:06:15 2013
From: Daniel Gianola <gianolaansci.wisc.edu>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Thu, 30 May 2013 20:06:15 -0500

Since inbreeding coefficients cannot be negative, being probabilities, this 
indicates that PLINK (have no idea what it does) does not use a good 
estimation procedures. In the latter, estimates must fall inside the 
permissible parameter space.

Regards,

Daniel



-----Original message-----
.From: Erik Scraggs <erikscraggsgmail.com>
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Sent: Thu, May 30, 2013 23:49:45 GMT+00:00
.Subject: Inbreeding coefficents

Dear all,

I would kindly appreciate if somebody within the community could provide me
with some assistance. I'm currently looking at estimating inbreeding
coefficients within a population of cattle.To do this I have been using the
inbreeding coefficients option in plink (--het option), which  given a
large number of SNPs, in a homogeneous sample, it is possible to calculate
inbreeding coefficients (i.e. based on the observed versus expected number
of homozygous genotypes).

I've run the program and posted below is a snapshot of my results, this is
where I would appreciate some clarity. Is it correct to assume, that where
you see a negative value in the F column, that this indicates that there is
no inbreeding and can therefore be set 0?


FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125 -0.1179
0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
0 FB2892 24327 2.60E+04 37504 -0.1414

--
Erik Scraggs, PhD
Department of Animal Sciences
Washington State University
Pullman, WA, 99164-4236, USA
Tel: 509-288-2291

From steibeljmsu.edu  Thu May 30 21:39:42 2013
From: MSU_JPS <steibeljmsu.edu>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Thu, 30 May 2013 21:39:42 -0500

Hi Erik 

In the plink documentation there are two notes that may be worth considering. 
I copy them below.  But what Daniel said is more important: understand 
estimates and their properties before using them and be wary of results that 
are plain wrong.

Note With whole genome data, it is probably best to apply this analysis to 
a subset that are pruned to be in approximate linkage equilibrium, say on 
the order of 50,000 autosomal SNPs. Use the --indep-pairwise and --indep 
commands to achieve this, described here.

Note The estimate of F can sometimes be negative. Often this will just 
reflect random sampling error, but a result that is strongly negative 
(i.e. an individual has fewer homozygotes than one would expect by chance 
at the genome-wide level) can reflect other factors, e.g. sample 
contamination events perhaps.

Sincerely,

Juan P. Steibel


On May 30, 2013, at 7:46 PM, Erik Scraggs wrote:

> Dear all,
> 
> I would kindly appreciate if somebody within the community could provide me
> with some assistance. I'm currently looking at estimating inbreeding
> coefficients within a population of cattle.To do this I have been using the
> inbreeding coefficients option in plink (--het option), which  given a
> large number of SNPs, in a homogeneous sample, it is possible to calculate
> inbreeding coefficients (i.e. based on the observed versus expected number
> of homozygous genotypes).
> 
> I've run the program and posted below is a snapshot of my results, this is
> where I would appreciate some clarity. Is it correct to assume, that where
> you see a negative value in the F column, that this indicates that there is
> no inbreeding and can therefore be set 0?
> 
> 
> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125 -0.1179
> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
> 0 FB2892 24327 2.60E+04 37504 -0.1414 
> 
> --
> Erik Scraggs, PhD
> Department of Animal Sciences
> Washington State University
> Pullman, WA, 99164-4236, USA
> Tel: 509-288-2291
 

From ytutsunomiyagmail.com  Thu May 30 21:40:42 2013
From: Yuri Tani Utsunomiya <ytutsunomiyagmail.com>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Thu, 30 May 2013 21:40:42 -0500

Dear Erik,

The inbreeding coefficient calculated by PLINK is equivalent to FIS in
Wright's F-statistics [1].

In a structured population, the fixation index (represented by F = the
degree of reduction in heterozygosity relative to Hardy-Weinberg
expectation) can be partitioned into three levels: FIT - individual (I)
relative to the total population (T); FIS - individual (I) relative to the
subpopulation (S); and FST - subpopulation (S) relative to the total (T).
Thus, Wright's FIS is often referred as the inbreeding coefficient, and a
simplistic definition is FIS = 1 - (HI/HS), where HI represents the
individual's heterozygosity, and HS the subpopulation's (or breed)
heterozygosity.

Looking at the definition proposed by Wright (1950) [1], F is better
interpreted as a correlation measure between alleles in different
'partitions' of a structured population, rather than a probability. This
means that it does assume negative values. If HI = HS, then FIS = 0, and
the individual has the exactly expected heterozygosity level for the
subpopulation. If HI < HS, then FIS > 0, and the individual is less
heterozygous than expected given the subpopulation's heterozygosity. The
closer to 1 FIS gets, the more inbred the individual is assumed to be. On
the other hand, if HI > HS, then F < 0, so the individual is more
heterozygous then expected given the subpopulation. Hence, negative values
denote outbred individuals.

I did not understand why you want to set the negative values to zero, as
FIS is not a probability. In fact, you can test departure of individual
heterozygosity from the expectation for the subpopulation by performing
tests for goodness of fit if you want p-values...

One worth note observation is: if the negative value is too small, then you
may want to double check the outbred sample for the possibility of
contamination (incidental mixing of two DNA sources, causing high sample
heterozygosity). Although FIS has been largely used in genetic diversity
studies using microsatellites to quantify inbreeding/diversity loss, you
may want to have a look at inbreeding levels estimation by means of runs of
homozygosity (ROH). While FIS largely relies on identity by state,
empirical data suggests that ROH better captures information of identity by
descent, and has been proposed as a suitable method to estimate
autozygosity - some people would say that it should replace the pedigree
estimates. PLINK also has an implementation for the algorithm [2].

For those who are not familiar with PLINK[3], I suggest checking it out. It
is elegantly written in C/C++, and is a pionner software for the analysis
of SNP data. It still remains one of the most complete toolsets available
out there.

Yours sincerely,

Yuri


[1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf
[2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo
[3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/

On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola <gianolaansci.wisc.edu>wrote:

> Since inbreeding coefficients cannot be negative, being probabilities, this
> indicates that PLINK (have no idea what it does) does not use a good
> estimation procedures. In the latter, estimates must fall inside the
> permissible parameter space.
>
> Regards,
>
> Daniel
>
>
>
> -----Original message-----
> .From: Erik Scraggs <erikscraggsgmail.com>
> .To: Multiple Recipients of <AnGenMapanimalgenome.org>
> .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00
> .Subject: Inbreeding coefficents
>
> Dear all,
>
> I would kindly appreciate if somebody within the community could provide me
> with some assistance. I'm currently looking at estimating inbreeding
> coefficients within a population of cattle.To do this I have been using the
> inbreeding coefficients option in plink (--het option), which  given a
> large number of SNPs, in a homogeneous sample, it is possible to calculate
> inbreeding coefficients (i.e. based on the observed versus expected number
> of homozygous genotypes).
>
> I've run the program and posted below is a snapshot of my results, this is
> where I would appreciate some clarity. Is it correct to assume, that where
> you see a negative value in the F column, that this indicates that there is
> no inbreeding and can therefore be set 0?
>
>
> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125
> -0.1179
> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
> 0 FB2892 24327 2.60E+04 37504 -0.1414
>
> --
> Erik Scraggs, PhD
> Department of Animal Sciences
> Washington State University
> Pullman, WA, 99164-4236, USA
> Tel: 509-288-2291


-- 
*Yuri T. Utsunomiya*
MSc student at São Paulo State University (UNESP - Brazil)
Laboratory of Animal Biochemistry and Molecular Biology - Araçatuba/SP
Mobile: *
+551881170036
*
Skype me: yuri.tani
*

?So I do dearly hope that the Genome Project does not give rise to some
naive biological determinism that says we are nothing more than the sum of
our genes. Geneticists don't believe that. Geneticists believe genes are an
important part of the story. By understanding that part of the story, we're
in a so-much better position to try to understand the rest of the story?
- Prof. Eric Lander*

From bmuirpurdue.edu  Thu May 30 21:44:57 2013
From: "Muir, William M." <bmuirpurdue.edu>
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Subject: FW: Inbreeding coefficents
Date: Thu, 30 May 2013 21:44:57 -0500

Hi Eric,

A negative inbreeding coefficient means that there is an excess heterozygosity 
as compared to expected.  I am not familiar with the output of Plink, but if 
the excess occurs at most loci, it is a signature of demography, indicating that 
you are most likely looking at a recent out crossing event. 

However, in order to correctly determine inbreeding from genotyping data the 
proportion of loci NOT segregating is also important.  Inbreeding drives loci 
to homozygosity, thus if you only considering those loci still segregating, you 
will observe an excess of heterozygosity compared to expected, as you did.

The question then becomes how to determine which non informative loci to include 
in the calculation.  In order to do this several breeds would have to be 
genotyped to determine the hypothetical ancestral population (HAP) allele 
frequencies.  From the theory of drift, inbreeding does not change allele 
frequency across populations, but does within sub-populations, i.e. the rate of 
fixation and loss is directly proportional to the initial allele frequency in 
the HAP.  Thus if a random set of subpopulations were sampled, the average 
allele frequency at that locus across those subpopulations, including those 
fixed and lost within some subpopulations, will estimate the HAP allele 
frequency (p) at that locus.  From this allele frequency, the total expected 
heterozygosity (Ht) is determined as 2pq, and summed across loci.

Next, to determine individual inbreeding, using those same loci in your 
population, including the ones fixed, determine Hi, heterozygosity of the 
individual, as the sum of the segregating loci over the total number of loci 
originally segregating in the HAP.

The ratio of Hi/Ht=Hit is the amount of heterozygosity in the individual 
relative to the HAP.  The total inbreeding coefficient of this individual is 
then Fx=1-Hit.  Wright called this Fit, i.e. inbreeding of the individual 
relative to total.  If the expected inbreeding were also be computed for each 
subpopulation based on the subpopulation allele frequency, this heterozygosity 
is Hs and Fst=1-Hs/Ht, which is the amount of drift that occurred between 
subpopulations. 

However, to complicate things even further, a SNP chip has ascertainment bias, 
meaning that only SNPs that were informative in certain breeds were put on the 
chip.  This results in the same problem and has to be corrected for, Andy Clark 
has a paper on how to do this.  Sequencing data is much better for determination 
of inbreeding coefficients as it does not have ascertainment bias.

I am sure this is more than you wanted, but the bottom line is it is difficult 
to get a true estimate of inbreeding with knowledge of the entire drift process 
and correct sampling of the genetic material.

A quick references would be from Hartl and Clark's book on population genetics, 
and they also reference the original works of Wright, Hill, and Weir (who also 
has a program to do this from genomic data, the book's title is  Genetic Data 
Analysis, and the program is at http://www.eeb.uconn.edu/people/plewis
/software.php).  I also have a publication on the topic in chickens using a SNP 
chip which I can share with you if interested.


Best Regards, Bill

-----------------------------
William Muir, Ph.D.
Professor Genetics
Department of Animal Sciences
Purdue University and
Department of Medicine 
Indiana University
Room G406 Lilly Hall
915 West State Street
W. Lafayette, IN 47907
765-494-8032
https://ag.purdue.edu/...?strAlias=bmuir&intDirDeptID=8
http://medicine.iupui.edu/iarc/


-----Original Message-----
.From: Erik Scraggs [mailto:erikscraggsgmail.com] 
.Sent: Thursday, May 30, 2013 7:46 PM
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Subject: Inbreeding coefficents

Dear all,

I would kindly appreciate if somebody within the community could provide me with
some assistance. I'm currently looking at estimating inbreeding coefficients
within a population of cattle.To do this I have been using the inbreeding
coefficients option in plink (--het option), which  given a large number of SNPs,
in a homogeneous sample, it is possible to calculate inbreeding coefficients
(i.e. based on the observed versus expected number of homozygous genotypes).

I've run the program and posted below is a snapshot of my results, this is where
I would appreciate some clarity. Is it correct to assume, that where you see a
negative value in the F column, that this indicates that there is no inbreeding
and can therefore be set 0?


FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125 -0.1179
0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
0 FB2892 24327 2.60E+04 37504 -0.1414 

--
Erik Scraggs, PhD
Department of Animal Sciences
Washington State University
Pullman, WA, 99164-4236, USA
Tel: 509-288-2291
  

From Andres.Legarratoulouse.inra.fr  Fri May 31 07:49:42 2013
From: Andres Legarra <Andres.Legarratoulouse.inra.fr>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 07:49:42 -0500

Hi,

inbreeding is excess of homozygotes respective to Hardy-Weinberg 
equilibrium (Falconer). Or, it is the correlation between uniting 
gametes (Wright). According to these definitions, it is NOT a 
probability and it can be therefore negative.

However, if you use pedigree to estimate inbreeding, you are forced to 
assume that all founder alleles are different, and as a byproduct of 
this assumption, inbreeding is positive and is also a probability of 
identity by descent.

When constructing genomic relationship matrices (VanRaden, 2008; Yang et 
al., 2010; etc) it is frequent to find negative values of inbreeding and 
also of relationships. These have to be interpreted as covariances and 
not like probabilities. Setting them to 0 creates havoc: you mess up the 
linear model and bias your results.

Andres



Le 31/05/2013 01:46, Erik Scraggs a �crit :
>
> Dear all,
>
> I would kindly appreciate if somebody within the community could provide me
> with some assistance. I'm currently looking at estimating inbreeding
> coefficients within a population of cattle.To do this I have been using the
> inbreeding coefficients option in plink (--het option), which  given a
> large number of SNPs, in a homogeneous sample, it is possible to calculate
> inbreeding coefficients (i.e. based on the observed versus expected number
> of homozygous genotypes).
>
> I've run the program and posted below is a snapshot of my results, this is
> where I would appreciate some clarity. Is it correct to assume, that where
> you see a negative value in the F column, that this indicates that there is
> no inbreeding and can therefore be set 0?
>
>
> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125 -0.1179
> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
> 0 FB2892 24327 2.60E+04 37504 -0.1414
>
> --
> Erik Scraggs, PhD
> Department of Animal Sciences
> Washington State University
> Pullman, WA, 99164-4236, USA
> Tel: 509-288-2291

-- 
Andres Legarra
+33 561285182
INRA, UR 631 SAGA, 24 Chemin de Borde Rouge - Auzeville
CS 52627
31326 Castanet Tolosan, France
http://genoweb.toulouse.inra.fr/~alegarra
From hsimiangwdg.de  Fri May 31 08:31:11 2013
From: "Simianer, Henner" <hsimiangwdg.de>
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Subject: AW: Inbreeding coefficents
Date: Fri, 31 May 2013 08:31:11 -0500

Hi Andres,

This is not the full story.

The inbreeding coefficient F is also defined as probability that the two 
homologous alleles at a random locus  in one individual are identical by 
descent (Malecot) and, being a probability, is bounded between 0 and 1, 
regardless of your assumptions on the base population. Thus, as often in 
quantitative genetics, the same thing has different definitions with 
different implications. Obviously estimates of F can be outside the interval 
(0,1) depending on the method you use. 

Best wishes

Henner



_____________________________________
Dr. Henner Simianer
Professor of Animal Breeding and Genetics
Department of Animal Sciences
Georg-August-University Goettingen
Albrecht-Thaer-Weg 3, 37075 Goettingen
Tel.: +49-551-395604, Fax: +49-551-395587
Email: hsimiangwdg.de
http://www.uni-goettingen.de/tierzucht


-----Urspr�ngliche Nachricht-----
Von: Andres Legarra [mailto:Andres.Legarratoulouse.inra.fr] 
Gesendet: Freitag, 31. Mai 2013 14:50
An: Multiple Recipients of
Betreff: Re: Inbreeding coefficents

Hi,

inbreeding is excess of homozygotes respective to Hardy-Weinberg equilibrium 
(Falconer). Or, it is the correlation between uniting gametes (Wright). 
According to these definitions, it is NOT a probability and it can be 
therefore negative.

However, if you use pedigree to estimate inbreeding, you are forced to 
assume that all founder alleles are different, and as a byproduct of this 
assumption, inbreeding is positive and is also a probability of identity by 
descent.

When constructing genomic relationship matrices (VanRaden, 2008; Yang et 
al., 2010; etc) it is frequent to find negative values of inbreeding and also 
of relationships. These have to be interpreted as covariances and not like 
probabilities. Setting them to 0 creates havoc: you mess up the linear model 
and bias your results.

Andres
From taylorjerrmissouri.edu  Fri May 31 08:34:10 2013
From: "Taylor, Jerry F. (Animal Science)" <taylorjerrmissouri.edu>
Subject: RE: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 08:34:10 -0500

Just a couple of other comments to add to the mix:

1. No matter how it is estimated/calculated/interpreted there is generally 
an assumption of random mating and selective neutrality (an absence of direct 
selection on genotype) associated with the locus/loci and these assumptions 
are generally violated in most populations due to drift and artificial 
selection. Thus, it is quite possible that you will observe a lower level of 
homozygosity within individuals than would be expected under these 
assumptions.

2. I am not sure how PLINK calculates the genomic relationship matrix, but 
if you have a read through PVR's great paper "Efficient methods to compute 
genomic predictions." J Dairy Sci. 2008 91(11):4414-23 you will see that the 
allele frequencies (AF) that are used to compute the GRM are for the base 
generation. Most programs that compute GRMs from genotype data simply compute 
AF at the locus using all animals and use this to construct the GRM and this 
is fine if the population is not subject to admixture, selection or drift. 
So:
	a) If your animals are crossbreds - you have a problem
	b) If your animals are stratified in time - you may have a problem

I have found that the F coefficients for a population of 3570 registered 
Angus animals are quite sensitive to the AF estimates. If you estimate AF 
using all animals these differ from AF estimates estimated using the oldest 
10% of animals and the effect on estimates of F is quite considerable. 

Jared Decker describes the very strong selection occurring genome-wide in 
these animals in his paper " A novel analytical method, Birth Date Selection 
Mapping, detects response of the Angus (Bos taurus) genome to selection on 
complex traits" BMC Genomics. 2012 13:606. He also examines the relationship 
between genomic F and pedigree F in this paper.

Jerry


-----Original Message-----
.From: Erik Scraggs [mailto:erikscraggsgmail.com] 
.Sent: Thursday, May 30, 2013 6:46 PM
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Subject: Inbreeding coefficents

Dear all,

I would kindly appreciate if somebody within the community could provide me 
with some assistance. I'm currently looking at estimating inbreeding 
coefficients within a population of cattle.To do this I have been using the 
inbreeding coefficients option in plink (--het option), which given a large 
number of SNPs, in a homogeneous sample, it is possible to calculate inbreeding 
coefficients (i.e. based on the observed versus expected number of 
homozygous genotypes).

I've run the program and posted below is a snapshot of my results, this is 
where I would appreciate some clarity. Is it correct to assume, that where 
you see a negative value in the F column, that this indicates that there is 
no inbreeding and can therefore be set 0?


FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125 -0.1179
0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
0 FB2892 24327 2.60E+04 37504 -0.1414 

--
Erik Scraggs, PhD
Department of Animal Sciences
Washington State University
Pullman, WA, 99164-4236, USA
Tel: 509-288-2291
From bmuirpurdue.edu  Fri May 31 08:34:17 2013
From: "Muir, William M." <bmuirpurdue.edu>
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Subject: FW: Inbreeding coefficents
Date: Fri, 31 May 2013 08:34:17 -0500

Hi,

Andres is of correct, and if calculating inbreeding coefficients using the 
genomic relationship approach (GRM), and scaled correctly, the inbreeding 
detected is that which has occurred within that breed or sub-population.  
Because inbreeding is cumulative, it can be broken down into that which has 
occurred prior to breed formation and that which has occurred after.  If one 
uses a single breed starting at some time after breed formation, the 
inbreeding detected is current inbreeding, not total inbreeding.  So the more 
important question is, what is the inbreeding coefficient being used for.  If 
it is for within breed comparisons, i.e. genomic selection, then current 
inbreeding is appropriate.  If one wants to know how much diversity has been 
lost as a result of population subdivision (breed formation) as well as 
current inbreeding, then one has to essentially do an across breed GRM.

In the first definition given by Andres as deviation from HWE, the issue is 
what allele frequency to use as 'p' to calculate expected heterozygosity.  If 
one uses p estimated within a breed, the inbreeding detected is local or 
current inbreeding.  If p is estimated from the HAP, then the expected 
heterozygosity is that in the HAP and inbreeding detected is total.

Bill


-----Original Message-----
.From: Andres Legarra [mailto:Andres.Legarratoulouse.inra.fr] 
.Sent: Friday, May 31, 2013 8:50 AM
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Subject: Re: Inbreeding coefficents

Hi,

inbreeding is excess of homozygotes respective to Hardy-Weinberg equilibrium 
(Falconer). Or, it is the correlation between uniting gametes (Wright). 
According to these definitions, it is NOT a probability and it can be 
therefore negative.

However, if you use pedigree to estimate inbreeding, you are forced to 
assume that all founder alleles are different, and as a byproduct of this 
assumption, inbreeding is positive and is also a probability of identity by 
descent.

When constructing genomic relationship matrices (VanRaden, 2008; Yang et 
al., 2010; etc) it is frequent to find negative values of inbreeding and also 
of relationships. These have to be interpreted as covariances and not like 
probabilities. Setting them to 0 creates havoc: you mess up the linear model 
and bias your results.

Andres


Le 31/05/2013 01:46, Erik Scraggs a �crit :

> Dear all,
>
> I would kindly appreciate if somebody within the community could 
> provide me with some assistance. I'm currently looking at estimating 
> inbreeding coefficients within a population of cattle.To do this I 
> have been using the inbreeding coefficients option in plink (--het 
> option), which  given a large number of SNPs, in a homogeneous sample, 
> it is possible to calculate inbreeding coefficients (i.e. based on the 
> observed versus expected number of homozygous genotypes).
>
> I've run the program and posted below is a snapshot of my results, 
> this is where I would appreciate some clarity. Is it correct to 
> assume, that where you see a negative value in the F column, that this 
> indicates that there is no inbreeding and can therefore be set 0?
>
>
> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287  0.1517
> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125  -0.1179
> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317  0.1194
> 0 FB2892 24327 2.60E+04 37504 -0.1414
>
> --
> Erik Scraggs, PhD
> Department of Animal Sciences
> Washington State University
> Pullman, WA, 99164-4236, USA
> Tel: 509-288-2291

--
Andres Legarra
+33 561285182
INRA, UR 631 SAGA, 24 Chemin de Borde Rouge - Auzeville CS 52627
31326 Castanet Tolosan, France
http://genoweb.toulouse.inra.fr/~alegarra
From gianolaansci.wisc.edu  Fri May 31 08:56:13 2013
From: Daniel Gianola <gianolaansci.wisc.edu>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 08:56:13 -0500

Points well taken, as I was assuming that this was based on pedigrees. 
We could certainly used negatively inbred individuals in some animal 
populations,eg, dogs.

It would be useful to revisit Cockerham (1969, 1973) where he revisits 
Wright's indexes in terms of variance components, and these cannot be 
negative except when silly unbiased estimators are used.

Please take my comments in the light of my ignorance about PLINK.

Bill Muir's remarks are very useful as well.

Regards,

Dan



-----Original message-----
.From: Yuri Tani Utsunomiya <ytutsunomiyagmail.com>
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Sent: Fri, May 31, 2013 02:42:10 GMT+00:00
.Subject: Re: Inbreeding coefficents

Dear Erik,

The inbreeding coefficient calculated by PLINK is equivalent to FIS in
Wright's F-statistics [1].

In a structured population, the fixation index (represented by F = the
degree of reduction in heterozygosity relative to Hardy-Weinberg
expectation) can be partitioned into three levels: FIT - individual (I)
relative to the total population (T); FIS - individual (I) relative to the
subpopulation (S); and FST - subpopulation (S) relative to the total (T).
Thus, Wright's FIS is often referred as the inbreeding coefficient, and a
simplistic definition is FIS = 1 - (HI/HS), where HI represents the
individual's heterozygosity, and HS the subpopulation's (or breed)
heterozygosity.

Looking at the definition proposed by Wright (1950) [1], F is better
interpreted as a correlation measure between alleles in different
'partitions' of a structured population, rather than a probability. This
means that it does assume negative values. If HI = HS, then FIS = 0, and
the individual has the exactly expected heterozygosity level for the
subpopulation. If HI < HS, then FIS > 0, and the individual is less
heterozygous than expected given the subpopulation's heterozygosity. The
closer to 1 FIS gets, the more inbred the individual is assumed to be. On
the other hand, if HI > HS, then F < 0, so the individual is more
heterozygous then expected given the subpopulation. Hence, negative values
denote outbred individuals.

I did not understand why you want to set the negative values to zero, as
FIS is not a probability. In fact, you can test departure of individual
heterozygosity from the expectation for the subpopulation by performing
tests for goodness of fit if you want p-values...

One worth note observation is: if the negative value is too small, then you
may want to double check the outbred sample for the possibility of
contamination (incidental mixing of two DNA sources, causing high sample
heterozygosity). Although FIS has been largely used in genetic diversity
studies using microsatellites to quantify inbreeding/diversity loss, you
may want to have a look at inbreeding levels estimation by means of runs of
homozygosity (ROH). While FIS largely relies on identity by state,
empirical data suggests that ROH better captures information of identity by
descent, and has been proposed as a suitable method to estimate
autozygosity - some people would say that it should replace the pedigree
estimates. PLINK also has an implementation for the algorithm [2].

For those who are not familiar with PLINK[3], I suggest checking it out. It
is elegantly written in C/C++, and is a pionner software for the analysis
of SNP data. It still remains one of the most complete toolsets available
out there.

Yours sincerely,

Yuri


[1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf
[2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo
[3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/

On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola <gianolaansci.wisc.edu>wrote:

> Since inbreeding coefficients cannot be negative, being probabilities, this
> indicates that PLINK (have no idea what it does) does not use a good
> estimation procedures. In the latter, estimates must fall inside the
> permissible parameter space.
>
> Regards,
>
> Daniel
>
>
> -----Original message-----
> .From: Erik Scraggs <erikscraggsgmail.com>
> .To: Multiple Recipients of <AnGenMapanimalgenome.org>
> .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00
> .Subject: Inbreeding coefficents
>
> Dear all,
>
> I would kindly appreciate if somebody within the community could provide me
> with some assistance. I'm currently looking at estimating inbreeding
> coefficients within a population of cattle.To do this I have been using the
> inbreeding coefficients option in plink (--het option), which  given a
> large number of SNPs, in a homogeneous sample, it is possible to calculate
> inbreeding coefficients (i.e. based on the observed versus expected number
> of homozygous genotypes).
>
> I've run the program and posted below is a snapshot of my results, this is
> where I would appreciate some clarity. Is it correct to assume, that where
> you see a negative value in the F column, that this indicates that there is
> no inbreeding and can therefore be set 0?
>
>
> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125
> -0.1179
> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
> 0 FB2892 24327 2.60E+04 37504 -0.1414
>
> --
> Erik Scraggs, PhD
> Department of Animal Sciences
> Washington State University
> Pullman, WA, 99164-4236, USA
> Tel: 509-288-2291


--
*Yuri T. Utsunomiya*
MSc student at São Paulo State University (UNESP - Brazil)
Laboratory of Animal Biochemistry and Molecular Biology - Araçatuba/SP
Mobile: * +551881170036 * Skype me: yuri.tani *

?So I do dearly hope that the Genome Project does not give rise to some
naive biological determinism that says we are nothing more than the sum of
our genes. Geneticists don't believe that. Geneticists believe genes are an
important part of the story. By understanding that part of the story, we're
in a so-much better position to try to understand the rest of the story?
- Prof. Eric Lander*

From ytutsunomiyagmail.com  Fri May 31 08:59:13 2013
From: Yuri Tani Utsunomiya <ytutsunomiyagmail.com>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 08:59:13 -0500

Thanks Bill and Andres for putting it down in clearer words.

I concur with Daniel: Weir and Cockerham's [1] contribution to redefine
F-statistics in a variance-components framework opened a window to a new
series of estimators for genetic diversity and differentiation in
geographically structured populations. I recommend reading [2] for a nice
review on the subject. Although [3] is a review focusing on FST, I find it
a very pleasant material to be read by anybody that wants to study
F-statistics.

Now, if your focus, Erik, is on individual autozygosity and potentially
inbreeding depression, rather than population diversity, you may want to go
beyond F-statistics and do ROH analysis.

Best,

Yuri


[1]http://www.jstor.org/stable/2408641
[2]http://www.annualreviews.org/...annurev.genet.36.050802.093940
[3]http://www.nature.com/...journal/v10/n9/pdf/nrg2611.pdf



On Fri, May 31, 2013 at 9:39 AM, Daniel Gianola <gianolaansci.wisc.edu>wrote:

> Points well taken, as I was assuming that this was based on pedigrees. We
> could certainly used negatively inbred individuals in some animal
> populations,eg, dogs.
>
> It would be useful to revisit Cockerham (1969, 1973) where he revisits
> Wright's indexes in terms of variance components, and these cannot be
> negative except when silly unbiased estimators are used.
>
> Please take my comments in the light of my ignorance about PLINK.
>
> Bill Muir's remarks are very useful as well.
>
> Regards,
>
> Dan
>
>
> *Connected by DROID on Verizon Wireless*
>
>
> -----Original message-----
>
> *From: *Yuri Tani Utsunomiya <ytutsunomiyagmail.com>*
> To: *Multiple Recipients of <angenmapanimalgenome.org>*
> Sent: *Fri, May 31, 2013 02:42:10 GMT+00:00*
> Subject: *Re: Inbreeding coefficents
>
> Dear Erik,
>
> The inbreeding coefficient calculated by PLINK is equivalent to FIS in
> Wright's F-statistics [1].
>
> In a structured population, the fixation index (represented by F = the
> degree of reduction in heterozygosity relative to Hardy-Weinberg
> expectation) can be partitioned into three levels: FIT - individual (I)
> relative to the total population (T); FIS - individual (I) relative to the
> subpopulation (S); and FST - subpopulation (S) relative to the total (T).
> Thus, Wright's FIS is often referred as the inbreeding coefficient, and a
> simplistic definition is FIS = 1 - (HI/HS), where HI represents the
> individual's heterozygosity, and HS the subpopulation's (or breed)
> heterozygosity.
>
> Looking at the definition proposed by Wright (1950) [1], F is better
> interpreted as a correlation measure between alleles in different
> 'partitions' of a structured population, rather than a probability. This
> means that it does assume negative values. If HI = HS, then FIS = 0, and
> the individual has the exactly expected heterozygosity level for the
> subpopulation. If HI < HS, then FIS > 0, and the individual is less
> heterozygous than expected given the subpopulation's heterozygosity. The
> closer to 1 FIS gets, the more inbred the individual is assumed to be. On
> the other hand, if HI > HS, then F < 0, so the individual is more
> heterozygous then expected given the subpopulation. Hence, negative values
> denote outbred individuals.
>
> I did not understand why you want to set the negative values to zero, as
> FIS is not a probability. In fact, you can test departure of individual
> heterozygosity from the expectation for the subpopulation by performing
> tests for goodness of fit if you want p-values...
>
> One worth note observation is: if the negative value is too small, then you
> may want to double check the outbred sample for the possibility of
> contamination (incidental mixing of two DNA sources, causing high sample
> heterozygosity). Although FIS has been largely used in genetic diversity
> studies using microsatellites to quantify inbreeding/diversity loss, you
> may want to have a look at inbreeding levels estimation by means of runs of
> homozygosity (ROH). While FIS largely relies on identity by state,
> empirical data suggests that ROH better captures information of identity by
> descent, and has been proposed as a suitable method to estimate
> autozygosity - some people would say that it should replace the pedigree
> estimates. PLINK also has an implementation for the algorithm [2].
>
> For those who are not familiar with PLINK[3], I suggest checking it out. It
> is elegantly written in C/C++, and is a pionner software for the analysis
> of SNP data. It still remains one of the most complete toolsets available
> out there.
>
> Yours sincerely,
>
> Yuri
>
>
> [1]
> http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf
> [2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo
> [3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/
>
> On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola <gianolaansci.wisc.edu
>>wrote:
>
>> Since inbreeding coefficients cannot be negative, being probabilities, this
>> indicates that PLINK (have no idea what it does) does not use a good
>> estimation procedures. In the latter, estimates must fall inside the
>> permissible parameter space.
>>
>> Regards,
>>
>> Daniel
>>
>>
>>
>> -----Original message-----
>> .From: Erik Scraggs <erikscraggsgmail.com>
>> .To: Multiple Recipients of <AnGenMapanimalgenome.org>
>> .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00
>> .Subject: Inbreeding coefficents
>>
>> Dear all,
>>
>> I would kindly appreciate if somebody within the community could provide me
>> with some assistance. I'm currently looking at estimating inbreeding
>> coefficients within a population of cattle.To do this I have been using the
>> inbreeding coefficients option in plink (--het option), which  given a
>> large number of SNPs, in a homogeneous sample, it is possible to calculate
>> inbreeding coefficients (i.e. based on the observed versus expected number
>> of homozygous genotypes).
>>
>> I've run the program and posted below is a snapshot of my results, this is
>> where I would appreciate some clarity. Is it correct to assume, that where
>> you see a negative value in the F column, that this indicates that there is
>> no inbreeding and can therefore be set 0?
>>
>>
>> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
>> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
>> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125
>> -0.1179
>> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
>> 0 FB2892 24327 2.60E+04 37504 -0.1414
>>
>> --
>> Erik Scraggs, PhD
>> Department of Animal Sciences
>> Washington State University
>> Pullman, WA, 99164-4236, USA
>> Tel: 509-288-2291
>
>
> --
> *Yuri T. Utsunomiya*
>
> MSc student at São Paulo State University (UNESP - Brazil)
> Laboratory of Animal Biochemistry and Molecular Biology - Araçatuba/SP
> Mobile: * +551881170036 * Skype me: yuri.tani *

From ydaumn.edu  Fri May 31 09:03:38 2013
From: Yang Da <ydaumn.edu>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 09:03:38 -0500

I think the main point is the distinction between pedigree inbreeding
coefficient, which is a function of IBD prabability and non-negative, and
genomic inbreeding coefficient, which could be negative due to the main
reasons given by other posts.

Yang Da, Ph. D.
Department of Animal Science
University of Minnesota


On Fri, May 31, 2013 at 8:31 AM, Simianer, Henner <hsimiangwdg.de> wrote:

> Hi Andres,
>
> This is not the full story.
>
> The inbreeding coefficient F is also defined as probability that the two
> homologous alleles at a
> random locus  in one individual are identical by descent (Malecot) and,
> being a probability, is
> bounded between 0 and 1, regardless of your assumptions on the base
> population. Thus, as often
> in quantitative genetics, the same thing has different definitions with
> different implications.
> Obviously estimates of F can be outside the interval (0,1) depending on
> the method you use.
>
> Best wishes
>
> Henner
>
>
> _____________________________________
> Dr. Henner Simianer
> Professor of Animal Breeding and Genetics
> Department of Animal Sciences
> Georg-August-University Goettingen
> Albrecht-Thaer-Weg 3, 37075 Goettingen
> Tel.: +49-551-395604, Fax: +49-551-395587
> Email: hsimiangwdg.de
> http://www.uni-goettingen.de/tierzucht
>
>
> -----Ursprÿÿngliche Nachricht-----
> Von: Andres Legarra [mailto:Andres.Legarratoulouse.inra.fr]
> Gesendet: Freitag, 31. Mai 2013 14:50
> An: Multiple Recipients of
> Betreff: Re: Inbreeding coefficents
>
> Hi,
>
> inbreeding is excess of homozygotes respective to Hardy-Weinberg
> equilibrium (Falconer). Or, it is
> the correlation between uniting gametes (Wright). According to these
> definitions, it is NOT a
> probability and it can be therefore negative.
>
> However, if you use pedigree to estimate inbreeding, you are forced to
> assume that all founder
> alleles are different, and as a byproduct of this assumption, inbreeding
> is positive and is also a
> probability of identity by descent.
>
> When constructing genomic relationship matrices (VanRaden, 2008; Yang et
> al., 2010; etc) it is
> frequent to find negative values of inbreeding and also of relationships.
> These have to be
> interpreted as covariances and not like probabilities. Setting them to 0
> creates havoc: you mess up
> the linear model and bias your results.
>
> Andres


From ytutsunomiyagmail.com  Fri May 31 09:50:29 2013
From: Yuri Tani Utsunomiya <ytutsunomiyagmail.com>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 09:50:29 -0500

Just to foster the discussion with some extra useful info, I believe PLINK
calculates F as:

[notice this is a Latex equation - to see how it looks like you can copy
and paste it here: http://www.codecogs.com/latex/eqneditor.php]

F_{i} = \frac{(O_{i} - E_{i})}{(L_{i} - E_{i})}

where O_{i} is observed homozygosity, L_{i} is the number of SNPs measured
in individual i and

E_{i} = \sum\limits_{j=1}^{L_{i}}
\left(1-2p_{j}(1-p_{j})\frac{n_{j}}{1-n_{j}}\right)

where nj and pj are the number of measured genotypes and the reference
allele frequency at locus j, respectively.

I may be wrong, but PLINK implements F as a descriptive statistic that can
be used by the user for three main purposes: 1) identify contaminated
samples; 2) identify excess of X chromosome heterozygosity in samples
declared to be male; 3) as the Wright inbreeding coefficient. Other usage
must be carefully assessed and interpreted. As said before in the
discussion, 'inbreeding coefficients' have a handful of different
definitions and contexts, but this in particular is a variant of the FIS
(IBS-based) measure defined by Wright in the analysis of structured
populations.

Yuri




On Fri, May 31, 2013 at 11:07 AM, Baumung, Roswitha (AGAG) 
 <Roswitha.Baumungfao.org> wrote:

> Dear colleagues,
>
> You might find the following publication interesting: Inbreeding: one word,
> several meanings, much confusion. Templeton AR, Read B. Source Department of
> Biology, Washington University, St. Louis, MO 63130.
>
> Abstract
>
> Because conservation biologists must frequently deal with small populations,
> inbreeding (a frequent consequence of small population size) has played a
> central role in many genetic management programs. However, the word
> "inbreeding" has several, often contradictory meanings, and a failure to
> distinguish among these meanings has caused much misunderstanding on the role
> of inbreeding in genetic management. Three different biological meanings of
> inbreeding are discussed in this paper: (1) inbreeding as a measure of shared
> ancestry in the paternal and maternal lineages of an individual; (2)
> inbreeding as a measure of genetic drift in a finite population, and (3)
> inbreeding as a measure of system of mating in a reproducing population. The
> distinction and use of these different measures of inbreeding are discussed
> and illustrated with a worked example, the North American captive population
> of Speke's gazelle (Gazella spekei). It is shown that these different meanings
> of the word inbreeding must be kept separated, otherwise erroneous management
> recommendations and evaluations can occur. On the positive side, the different
> measures of inbreeding when used jointly can be a powerful management tool
> precisely because they measure different biological phenomena.
>
> Kind regards, Roswitha
>
>
> Roswitha Baumung
> Animal Production Officer
> Animal Genetic Resources Branch
> Animal Production and Health Division
> FAO - Food and Agriculture Organization of the United Nations
> Viale delle Terme di Caracalla
> 00153 Rome Italy
> Tel. +39 06 57052158
>
>
> -----Original Message-----
> .From: Simianer, Henner [mailto:hsimiangwdg.de]
> .Sent: 31 May 2013 15:31
> .To: Multiple Recipients of <AnGenMapanimalgenome.org>
> .Subject: AW: Inbreeding coefficents
>
> Hi Andres,
>
> This is not the full story.
>
> The inbreeding coefficient F is also defined as probability that the two
> homologous alleles at a random locus in one individual are identical by
> descent (Malecot) and, being a probability, is bounded between 0 and 1,
> regardless of your assumptions on the base population. Thus, as often in
> quantitative genetics, the same thing has different definitions with
> different
> implications. Obviously estimates of F can be outside the interval (0,1)
> depending on the method you use.
>
> Best wishes
>
> Henner
>
>
> _____________________________________
> Dr. Henner Simianer
> Professor of Animal Breeding and Genetics
> Department of Animal Sciences
> Georg-August-University
> Goettingen Albrecht-Thaer-Weg 3, 37075 Goettingen
> Tel.: +49-551-395604, Fax: +49-551-395587
> Email: hsimiangwdg.de
> http://www.uni-goettingen.de/tierzucht

From erikscraggsgmail.com  Fri May 31 10:04:48 2013
From: Erik Scraggs <erikscraggsgmail.com>
Subject: Re: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 10:04:48 -0500

Dear all,

I greatly appreciate your help, thank you for taking the time to provide me
such detailed explanations. It is great to have the help of the community.

Many thanks

Erik


On Fri, May 31, 2013 at 6:34 AM, Taylor, Jerry F. (Animal Science) <
taylorjerrmissouri.edu> wrote:

> Just a couple of other comments to add to the mix:
> 
> 1. No matter how it is estimated/calculated/interpreted there is generally 
> an assumption of random mating and selective neutrality (an absence of direct 
> selection on genotype) associated with the locus/loci and these assumptions 
> are generally violated in most populations due to drift and artificial 
> selection. Thus, it is quite possible that you will observe a lower level of 
> homozygosity within individuals than would be expected under these 
> assumptions.
> 
> 2. I am not sure how PLINK calculates the genomic relationship matrix, but 
> if you have a read through PVR's great paper "Efficient methods to compute 
> genomic predictions." J Dairy Sci. 2008 91(11):4414-23 you will see that the 
> allele frequencies (AF) that are used to compute the GRM are for the base 
> generation. Most programs that compute GRMs from genotype data simply compute 
> AF at the locus using all animals and use this to construct the GRM and this 
> is fine if the population is not subject to admixture, selection or drift. 
> So:
>     a) If your animals are crossbreds - you have a problem
>     b) If your animals are stratified in time - you may have a problem
> 
> I have found that the F coefficients for a population of 3570 registered 
> Angus animals are quite sensitive to the AF estimates. If you estimate AF 
> using all animals these differ from AF estimates estimated using the oldest 
> 10% of animals and the effect on estimates of F is quite considerable. 
> 
> Jared Decker describes the very strong selection occurring genome-wide in 
> these animals in his paper " A novel analytical method, Birth Date Selection 
> Mapping, detects response of the Angus (Bos taurus) genome to selection on 
> complex traits" BMC Genomics. 2012 13:606. He also examines the relationship 
> between genomic F and pedigree F in this paper.
> 
> Jerry
> 
>
> -----Original Message-----
> .From: Erik Scraggs [mailto:erikscraggsgmail.com]
> .Sent: Thursday, May 30, 2013 6:46 PM
> .To: Multiple Recipients of <AnGenMapanimalgenome.org>
> .Subject: Inbreeding coefficents
>
> Dear all,
> 
> I would kindly appreciate if somebody within the community could provide me
> with some assistance. I'm currently looking at estimating inbreeding
> coefficients within a population of cattle.To do this I have been using the
> inbreeding coefficients option in plink (--het option), which  given a
> large number of SNPs, in a homogeneous sample, it is possible to calculate
> inbreeding coefficients (i.e. based on the observed versus expected number
> of homozygous genotypes).
> 
> I've run the program and posted below is a snapshot of my results, this is
> where I would appreciate some clarity. Is it correct to assume, that where
> you see a negative value in the F column, that this indicates that there is
> no inbreeding and can therefore be set 0?
> 
> 
> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287 0.1517
> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125 -0.1179
> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317 0.1194
> 0 FB2892 24327 2.60E+04 37504 -0.1414 

-- 
Erik Scraggs, PhD
Department of Animal Sciences
Washington State University
Pullman, WA, 99164-4236, USA
Tel: 509-288-2291

From Roger.VallejoARS.USDA.GOV  Fri May 31 10:22:49 2013
From: "Vallejo, Roger" <Roger.VallejoARS.USDA.GOV>
Subject: RE: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 10:22:49 -0500

I think the basic questions are not being answered. Are the Wright's 
F-statistics correlations or probabilities? Then, you can decide on how to 
treat these F-statistics. Let me add this.

The F-statistic model is a hierarchical model with genes stratified at three 
levels: Individuals (I), within subdivisions (S) and within the total population 
(T). It has three main parameters: FIT is the correlation of uniting gametes 
relative to those of the total population; FIS is the average over all 
subdivisions of the correlation of uniting gametes relative to the gametes of 
the subdivision; and FST is the correlation of random gametes within 
subdivisions relative to the total population. The three F-statistics are 
interrelated as (1 - FIT) = (1 - FST) (1 - FIS). A variety of derivations of 
this basic relationship are available (Wright 1951, 1965; Cockerham 1969). It is 
clear from WRIGHT's formulation of the F-statistic model that the parameters FIS 
and FIT are free to take either positive or negative values depending on whether 
there is a deficit or excess of heterozygotes; it is also clear from WRIGHT's 
work that the parameter FST is necessarily positive (JC Long, Genetics 1986).

I hope this helps some on this very interesting issue.

Roger


Roger L. Vallejo, Ph.D.
U.S. Department of Agriculture, ARS, NCCCWA
Voice:  (304) 724-8340 Ext. 2141
Email:  roger.vallejoars.usda.gov
http://www.ars.usda.gov/...ople/people.htm?personid=37662

-----Original Message-----
.From: Daniel Gianola [mailto:gianolaansci.wisc.edu]
.Sent: Friday, May 31, 2013 9:56 AM
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Subject: Re: Inbreeding coefficents

Points well taken, as I was assuming that this was based on pedigrees.
We could certainly used negatively inbred individuals in some animal 
populations,eg, dogs.

It would be useful to revisit Cockerham (1969, 1973) where he revisits Wright's 
indexes in terms of variance components, and these cannot be negative except 
when silly unbiased estimators are used.

Please take my comments in the light of my ignorance about PLINK.

Bill Muir's remarks are very useful as well.

Regards,

Dan



-----Original message-----
.From: Yuri Tani Utsunomiya <ytutsunomiyagmail.com>
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Sent: Fri, May 31, 2013 02:42:10 GMT+00:00
.Subject: Re: Inbreeding coefficents

Dear Erik,

The inbreeding coefficient calculated by PLINK is equivalent to FIS in Wright's 
F-statistics [1].

In a structured population, the fixation index (represented by F = the degree of 
reduction in heterozygosity relative to Hardy-Weinberg expectation) can be 
partitioned into three levels: FIT - individual (I) relative to the total 
population (T); FIS - individual (I) relative to the subpopulation (S); and FST 
- subpopulation (S) relative to the total (T).  Thus, Wright's FIS is often 
referred as the inbreeding coefficient, and a simplistic definition is FIS = 1 - 
(HI/HS), where HI represents the individual's heterozygosity, and HS the 
subpopulation's (or breed) heterozygosity.

Looking at the definition proposed by Wright (1950) [1], F is better interpreted 
as a correlation measure between alleles in different 'partitions' of a 
structured population, rather than a probability. This means that it does assume 
negative values. If HI = HS, then FIS = 0, and the individual has the exactly 
expected heterozygosity level for the subpopulation. If HI < HS, then FIS > 0, 
and the individual is less heterozygous than expected given the subpopulation's 
heterozygosity. The closer to 1 FIS gets, the more inbred the individual is 
assumed to be. On the other hand, if HI > HS, then F < 0, so the individual is 
more heterozygous then expected given the subpopulation. Hence, negative values 
denote outbred individuals.

I did not understand why you want to set the negative values to zero, as FIS is 
not a probability. In fact, you can test departure of individual heterozygosity 
from the expectation for the subpopulation by performing tests for goodness of 
fit if you want p-values...

One worth note observation is: if the negative value is too small, then you may 
want to double check the outbred sample for the possibility of contamination 
(incidental mixing of two DNA sources, causing high sample heterozygosity). 
Although FIS has been largely used in genetic diversity studies using 
microsatellites to quantify inbreeding/diversity loss, you may want to have a 
look at inbreeding levels estimation by means of runs of homozygosity (ROH). 
While FIS largely relies on identity by state, empirical data suggests that ROH 
better captures information of identity by descent, and has been proposed as a 
suitable method to estimate autozygosity - some people would say that it should 
replace the pedigree estimates. PLINK also has an implementation for the 
algorithm [2].

For those who are not familiar with PLINK[3], I suggest checking it out. It is 
elegantly written in C/C++, and is a pionner software for the analysis of SNP 
data. It still remains one of the most complete toolsets available out there.

Yours sincerely,

Yuri


[1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf
[2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo
[3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/

On Thu, May 30, 2013 at 10:06 PM, Daniel Gianola <gianolaansci.wisc.edu>wrote:

> Since inbreeding coefficients cannot be negative, being probabilities,
> this indicates that PLINK (have no idea what it does) does not use a
> good estimation procedures. In the latter, estimates must fall inside
> the permissible parameter space.
>
> Regards,
>
> Daniel
>
>
> -----Original message-----
> .From: Erik Scraggs <erikscraggsgmail.com>
> .To: Multiple Recipients of <AnGenMapanimalgenome.org>
> .Sent: Thu, May 30, 2013 23:49:45 GMT+00:00
> .Subject: Inbreeding coefficents
>
> Dear all,
>
> I would kindly appreciate if somebody within the community could
> provide me with some assistance. I'm currently looking at estimating
> inbreeding coefficients within a population of cattle.To do this I
> have been using the inbreeding coefficients option in plink (--het
> option), which  given a large number of SNPs, in a homogeneous sample,
> it is possible to calculate inbreeding coefficients (i.e. based on the
> observed versus expected number of homozygous genotypes).
>
> I've run the program and posted below is a snapshot of my results,
> this is where I would appreciate some clarity. Is it correct to
> assume, that where you see a negative value in the F column, that this
> indicates that there is no inbreeding and can therefore be set 0?
>
>
> FID IID O(HOM) E(HOM) N(NM) F  0 FB686 28313 2.58E+04 37177 0.2243  0
> FB1615 25773 2.60E+04 37510 -0.01666  0 FB2101 27566 2.58E+04 37287
> 0.1517
> 0 FB2126 23992 2.60E+04 37494 -0.1699  0 FB2127 24348 2.57E+04 37125
> -0.1179
> 0 FB2422 24469 2.60E+04 37497 -0.1287  0 FB2501 27209 2.58E+04 37317
> 0.1194
> 0 FB2892 24327 2.60E+04 37504 -0.1414
>
> --
> Erik Scraggs, PhD
> Department of Animal Sciences
> Washington State University
> Pullman, WA, 99164-4236, USA
> Tel: 509-288-2291

--
*Yuri T. Utsunomiya*
MSc student at S�o Paulo State University (UNESP - Brazil) Laboratory of Animal
Biochemistry and Molecular Biology - Ara�atuba/SP
Mobile: * +551881170036 * Skype me: yuri.tani *

?So I do dearly hope that the Genome Project does not give rise to some naive
biological determinism that says we are nothing more than the sum of our genes.
Geneticists don't believe that. Geneticists believe genes are an important part
of the story. By understanding that part of the story, we're in a so-much better
position to try to understand the rest of the story?
- Prof. Eric Lander*

  

From gianolaansci.wisc.edu  Fri May 31 10:50:17 2013
From: Daniel Gianola <gianolaansci.wisc.edu>
Subject: RE: Inbreeding coefficents
Postmaster: submission approved
To: Multiple Recipients of <angenmapanimalgenome.org>
Date: Fri, 31 May 2013 10:50:17 -0500

Roger:

Well put.

Daniel



-----Original message-----
.From: "Vallejo, Roger" <Roger.VallejoARS.USDA.GOV>
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Sent: Fri, May 31, 2013 15:24:19 GMT+00:00
.Subject: RE: Inbreeding coefficents

I think the basic questions are not being answered. Are the Wright's
F-statistics correlations or probabilities? Then, you can decide on how to
treat these F-statistics. Let me add this.

The F-statistic model is a hierarchical model with genes stratified at three
levels: Individuals (I), within subdivisions (S) and within the total population
(T). It has three main parameters: FIT is the correlation of uniting gametes
relative to those of the total population; FIS is the average over all
subdivisions of the correlation of uniting gametes relative to the gametes of
the subdivision; and FST is the correlation of random gametes within
subdivisions relative to the total population. The three F-statistics are
interrelated as (1 - FIT) = (1 - FST) (1 - FIS). A variety of derivations of
this basic relationship are available (Wright 1951, 1965; Cockerham 1969). It is
clear from WRIGHT's formulation of the F-statistic model that the parameters FIS
and FIT are free to take either positive or negative values depending on whether
there is a deficit or excess of heterozygotes; it is also clear from WRIGHT's
work that the parameter FST is necessarily positive (JC Long, Genetics 1986).

I hope this helps some on this very interesting issue.

Roger


Roger L. Vallejo, Ph.D.
U.S. Department of Agriculture, ARS, NCCCWA
Voice:  (304) 724-8340 Ext. 2141
Email:  roger.vallejoars.usda.gov
http://www.ars.usda.gov/...ople/people.htm?personid=37662

-----Original Message-----
.From: Daniel Gianola [mailto:gianolaansci.wisc.edu]
.Sent: Friday, May 31, 2013 9:56 AM
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Subject: Re: Inbreeding coefficents

Points well taken, as I was assuming that this was based on pedigrees.
We could certainly used negatively inbred individuals in some animal
populations,eg, dogs.

It would be useful to revisit Cockerham (1969, 1973) where he revisits Wright's
indexes in terms of variance components, and these cannot be negative except
when silly unbiased estimators are used.

Please take my comments in the light of my ignorance about PLINK.

Bill Muir's remarks are very useful as well.

Regards,

Dan


-----Original message-----
.From: Yuri Tani Utsunomiya <ytutsunomiyagmail.com>
.To: Multiple Recipients of <AnGenMapanimalgenome.org>
.Sent: Fri, May 31, 2013 02:42:10 GMT+00:00
.Subject: Re: Inbreeding coefficents

Dear Erik,

The inbreeding coefficient calculated by PLINK is equivalent to FIS in Wright's
F-statistics [1].

In a structured population, the fixation index (represented by F = the degree of
reduction in heterozygosity relative to Hardy-Weinberg expectation) can be
partitioned into three levels: FIT - individual (I) relative to the total
population (T); FIS - individual (I) relative to the subpopulation (S); and FST
- subpopulation (S) relative to the total (T).  Thus, Wright's FIS is often
referred as the inbreeding coefficient, and a simplistic definition is FIS = 1 -
(HI/HS), where HI represents the individual's heterozygosity, and HS the
subpopulation's (or breed) heterozygosity.

Looking at the definition proposed by Wright (1950) [1], F is better interpreted
as a correlation measure between alleles in different 'partitions' of a
structured population, rather than a probability. This means that it does assume
negative values. If HI = HS, then FIS = 0, and the individual has the exactly
expected heterozygosity level for the subpopulation. If HI < HS, then FIS > 0,
and the individual is less heterozygous than expected given the subpopulation's
heterozygosity. The closer to 1 FIS gets, the more inbred the individual is
assumed to be. On the other hand, if HI > HS, then F < 0, so the individual is
more heterozygous then expected given the subpopulation. Hence, negative values
denote outbred individuals.

I did not understand why you want to set the negative values to zero, as FIS is
not a probability. In fact, you can test departure of individual heterozygosity
from the expectation for the subpopulation by performing tests for goodness of
fit if you want p-values...

One worth note observation is: if the negative value is too small, then you may
want to double check the outbred sample for the possibility of contamination
(incidental mixing of two DNA sources, causing high sample heterozygosity).
Although FIS has been largely used in genetic diversity studies using
microsatellites to quantify inbreeding/diversity loss, you may want to have a
look at inbreeding levels estimation by means of runs of homozygosity (ROH).
While FIS largely relies on identity by state, empirical data suggests that ROH
better captures information of identity by descent, and has been proposed as a
suitable method to estimate autozygosity - some people would say that it should
replace the pedigree estimates. PLINK also has an implementation for the
algorithm [2].

For those who are not familiar with PLINK[3], I suggest checking it out. It is
elegantly written in C/C++, and is a pionner software for the analysis of SNP
data. It still remains one of the most complete toolsets available out there.

Yours sincerely,

Yuri

[1] http://onlinelibrary.wiley.com/...j.1469-1809.1949.tb02451.x/pdf
[2] http://pngu.mgh.harvard.edu/...urcell/plink/ibdibs.shtml#homo
[3] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1950838/

Go back to the AnGenMap main page.

© 2003-2021 Creative Commons licenses by NAGRP - Bioinformatics Coordination Program.
Contact: NAGRP Bioinformatics Team
Helpdesk