Pedigree

Introduction

In an `animal model' or `sire model' genetic analysis we have data on a set of animals that are genetically linked via a pedigree. The genetic effects are therefore correlated and, assuming normal modes of inheritance, the correlation expected from additive genetic effects can be derived from the pedigree provided all the genetic links are in the pedigree. The additive genetic relationship matrix (sometimes called the numerator relationship matrix) can be calculated from the pedigree. It is actually the inverse relationship matrix that is formed by ASReml for analysis. Users new to this subject might find notes by Julius van der Werf helpful: Mixed_Models_for_Genetic_analysis.pdf. For the more general situation where the pedigree based inverse relationship matrix is not the appropriate/required matrix, the user can provide a particular general inverse variance ( GIV ) matrix explicitly in a .giv file. In this chapter we consider data presented in Harvey (1977) using the command file harvey.as
 Pedigree file example
  animal     !P
  sire       !A
  dam
  lines       2
  damage
  adailygain
 harvey.ped !ALPHA
 harvey.dat
 adailygain ~ mu lines,  !r animal 0.25

Pedigree factor type

In ASReml the !P data field qualifier indicates that the corresponding data field has an associated pedigree. The file containing the pedigree ( harvey.ped in the example) for animal is specified after all field definitions and before the datafile definition. See below for the first 20 lines of harvey.ped together with the corresponding lines of the data file harvey.dat. All individuals appearing in the data file must appear in the pedigree file. When all the pedigree information ( individual, male_parent, female_parent ) appears as the first three fields of the data file, the data file can double as the pedigree file. In this example the line harvey.ped !ALPHA could be replaced with harvey.dat !ALPHA. Typically additional individuals providing additional genetic links are present in the pedigree file.

The pedigree file

The pedigree file is used to define the genetic relationships for fitting a genetic animal model and is required if the !P qualifier is associated with a data field. The pedigree file
  • has three fields; the identities of an
  • individual, its sire and its dam (or maternal grand sire if the !MGS qualifier, is specified), in that order,
  • use identity
  • 0 or * for unknown parents.
  • is sorted so that the line giving the
  • pedigree of an individual appears before any line where that individual appears as a parent,
  • is read free format; it may be the same
  • file as the data file if the data file is free format and has the necessary identities in the first three fields, see below,
  • is specified on the line immediately
  • preceding the data file line in the command file,
     harvey.ped              harvey.dat
     101 SIRE_1 0            101 SIRE_1 0 1 3 192 390 2241
     102 SIRE_1 0		 102 SIRE_1 0 1 3 154 403 2651
     103 SIRE_1 0		 103 SIRE_1 0 1 4 185 432 2411
     104 SIRE_1 0		 104 SIRE_1 0 1 4 183 457 2251
     105 SIRE_1 0		 105 SIRE_1 0 1 5 186 483 2581
     106 SIRE_1 0		 106 SIRE_1 0 1 5 177 469 2671
     107 SIRE_1 0		 107 SIRE_1 0 1 5 177 428 2711
     108 SIRE_1 0		 108 SIRE_1 0 1 5 163 439 2471
     109 SIRE_2 0		 109 SIRE_2 0 1 4 188 439 2292
     110 SIRE_2 0		 110 SIRE_2 0 1 4 178 407 2262
     111 SIRE_2 0		 111 SIRE_2 0 1 5 198 498 1972
     112 SIRE_2 0		 112 SIRE_2 0 1 5 193 459 2142
     113 SIRE_2 0		 113 SIRE_2 0 1 5 186 459 2442
     114 SIRE_2 0		 114 SIRE_2 0 1 5 175 375 2522
     115 SIRE_2 0		 115 SIRE_2 0 1 5 171 382 1722
     116 SIRE_2 0		 116 SIRE_2 0 1 5 168 417 2752
     117 SIRE_3 0		 117 SIRE_3 0 1 3 154 389 2383
     118 SIRE_3 0		 118 SIRE_3 0 1 4 184 414 2463
     119 SIRE_3 0		 119 SIRE_3 0 1 5 174 483 2293
     120 SIRE_3 0		 120 SIRE_3 0 1 5 170 430 2303
    

    Reading in the pedigree file

    The syntax for specifying a pedigree file in the ASReml command file is
    pedigree_file [qualifiers]
  • the
  • qualifiers are listed below,
  • the identities (
  • individual, male_parent, female_parent ) are merged into a single list and the inverse relationship is formed before the data file is read,
  • when the data file is read, data
  • fields with the !P qualifier are recoded according to the combined identity list,
  • the inverse
  • relationship matrix is automatically associated with factors coded from the pedigree file unless some other covariance structure is specified. The inverse relationship matrix is specified with the variance model name AINV ,
  • the
  • inverse relationship matrix is written to ainverse.bin,
  • if
  • ainverse.bin already exists ASReml assumes it was formed in a previous run and has the correct inverse; ainverse.bin is read, rather than the inverse being reformed (unless !MAKE is specified); this saves time when performing repeated analyses based on a particular pedigree; delete ainverse.bin or specify !MAKE if the pedigree is changed between runs,
  • identities are printed in the
  • .sln file,
  • identities should be whole numbers less than 200,000,000 unless
  • !ALPHA is specified,
  • pedigree lines for parents must precede their progeny,
  • unknown parents should be given the identity number 0,
  • if an individual appearing as a parent does not appear in the first column, it is assumed to have unknown parents, that is, parents with unknown parentage do not need their own line in the file,
  • identities may appear as both male and female
  • parents, for example, in forestry.

    Pedigree file qualifiers

  • !ALPHA indicates that the identities are alphanumeric with up to 20 characters; otherwise by default they are numeric whole numbers <200,000,000.
  • !DIAG causes the pedigree identifiers, the diagonal elements of the Inverse of the Relationship Matrix and the inbreeding coefficients for the individuals (calculated as the diagonal of A-I) to be written to AINVERSE.DIA.
  • !GIV instructs ASReml to write out the A-inverse in the format of .giv files.
  • !GROUPS g includes genetic groups in the pedigree. The first g lines of the pedigree identify genetic groups (with zero in both the sire and dam fields). All other lines must specify one of the gen
  • !INBRED generates pedigree for inbred lines. Each cross is assumed to be selfed several times to stabilize as an inbred line as is usual for cereals, before being evaluated or crossed with another line. Since inbreeding is usually associated with strong selection, it is not obvious that a pedigree assumption of covariance of 0.5 between parent and offspring actually holds. Do not use the !INBRED qualifier with the !MGS or !SELF qualifiers.
  • !MAKE tells ASReml to make the A-inverse (rather than trying to retrieve it from the ainverse.bin file). !MGS indicates that the third identity is the sire of the dam rather than the dam.
  • !REPEAT tells ASReml to ignore repeat occurrences of lines in the pedigree file. Use of this option will avoid the check that animals occur in chronological order, but chronological order is still required.
  • !SELF s allows partial selfing when third field is unknown. It indicates that progeny from a cross where the second parent (male\_parent) is unknown, is assumed to be from selfing with probability s and from outcrossing with probability (1-s). This is appropriate in some forestry tree breeding studies where seed collected from a tree may have been pollinated by the mother tree or pollinated by some other tree. Do not use the !SELF qualifier with the !INBRED or !MGS qualifiers.
  • !SKIP n you to skip n header lines at the top of the file.
  • !SORT causes ASReml to sort the pedigree into an acceptable order, that is parents before offspring, before forming the A-Inverse. The sorted pedigree is written to a file whose name has .srt appended to its name.
  • A pdf file
  • pedigree.pdf contains details of these options.

    Genetic groups

    If all individuals belong to one genetic group, then use 0 as the identity of the parents of base individuals. However, if base individuals belong to various genetic groups this is indicated by the !GROUP qualifier and the pedigree file must begin by identifying these groups. All base individuals should have group identifiers as parents. In this case the identity 0 will only appear on the group identity lines, as in the following example where three sire lines are fitted as genetic groups.
     Genetic group example
      animal  !P
      sire  9 !A
      dam
      lines  2
      damage
      adailygain
     harveyg.ped  !ALPHA !MAKE !GROUP 3
     harvey.dat
     adailygain ~ mu !r animal 02.5 !GU
    

     G1 0 0
     G2 0 0
     G3 0 0
     SIRE_1 G1 G1
     SIRE_2 G1 G1
     SIRE_3 G1 G1
     SIRE_4 G2 G2
     SIRE_5 G2 G2
     SIRE_6 G3 G3
     SIRE_7 G3 G3
     SIRE_8 G3 G3
     SIRE_9 G3 G3
     101 SIRE_1 G1
     102 SIRE_1 G1
     103 SIRE_1 G1
      ...
     163 SIRE_9 G3
     164 SIRE_9 G3
     165 SIRE_9 G3
    
    It is usually appropriate to allocate a genetic group identifier where the parent is unknown.

    Return to start