A Scroll-through Tutorial:

Using CRIMAP to Perform Linkage Analysis
for Genetic Map Constructions

This tutorial is to serve as a complimentary material to the CRIMAP Manual ("Documentation for CRI-MAP, version 2.4 (3/26/90)" by Phil Green, Kathy Falls, and Steve Crooks) for beginners to walk through step-by-step procedures running CRIMAP program. The users are assumed to have already known the basics of the linkage analysis and understand the theory behind. For concepts of linkage and LODs, please refer to this link.

This tutorial is designed using PiGMaP family genotype data as an example. By the end of the tutorial, the users are expected to be able to do linkage analysis for his/her own data set against the existing PiGMaP data for mapping purposes.



1. The data structure and data preprocessing:

The datastructure is defined to have the format:
FamID  ChildreNum
PigID  CrimapCode  DameID  SireID  Sex  Allele 1  Allele 2

An actual example of the data structure:

 1         27   (Edinburgh 1)
 153       1         0      0       1     1        2     
 833       2         0      0       0     1        1
 856       3         0      0       1     1        1
 433       4         0      0       0     1        2
 9591      5         2      1       1     2        2
 9360      6         4      3       0     1        2
 9365      7         4      3       0     2        2
  :        :         :      :       :     :        :
  :        :         :      :       :     :        :

In crimap analysis, CRIMAP uses the "CrimapCode", not the "Individual ID" ("PigID") for analysis. The "PigID" have to be taken out once the genotype is enterred and the actual data crimap working sheet should look like:

 1    27
 1    0    0    1    1    2                              
 2    0    0    0    1    1
 3    0    0    1    1    1
 4    0    0    0    1    2
 5    2    1    1    2    2
 6    4    3    0    1    2
 7    4    3    0    2    2
 :    :    :    :    :    :
 :    :    :    :    :    :

The reformating of the datasheet is taken care of by a web form and its related CGI program written in Perl (by Zhiliang Hu). To use the CGI program to reformat your data, please fill in the information in the web form and be sure to also put in your email address correctly in the corresponding field in order for you to receive the reformated data in your email.

The mail containing the reformated data you receive may be in the following format:

  FamNumbers MarkerNumbers
  MarkerName(s)

  Fam1  ChildreNum
  PigID  CrimapCode  DameID  SireID  Sex  Allele 1  Allele 2

  Fam2  ChildreNum
  PigID  CrimapCode  DameID  SireID  Sex  Allele 1  Allele 2

As in:

From: webmaster@db.genome.iastate.edu
To: hu@db.genome.iastate.edu
Cc: kskim@iastate.edu
Subject: Genotype Data Enterred for CRIMAP Analysis

  <----X Cut here before doing "crimap .par merge" X---->

6  1
Fatty 

1  27
1  0  0  1   2   2
2  0  0  0   1   2
3  0  0  1   1   1
:  :  :  :   :   :
234  223  119  0   0   0
235  223  119  0   0   0
236  223  119  0   0   0

  <----X Cut here before doing "crimap .par merge" X---->

This data submission was made from timon.ansci.iastate.edu
(ip=129.186.111.165) on Mon Jan 11 22:09:05 CST 1999
 by Kwan Suk Kim.

You have to cut off the mail header and trailer on (include) the line that says

<----X cut here ... ---->
and save the data into a file, say, "fatty.data", for further analysis.



2. Getting your unix account environment ready:

Use Secured Shell (ssh) to login in your "genome" account:
ssh db.genome.iastate.edu
Note: Obtain your login and passwd from
Zhiliang Hu.

Your unix account environment is customized to use "tcsh". If you find it is not the case, do a "chsh" to change it to "tcsh".

Creat a sub-directory for yourself and "cd" to your working directory:

    > mkdir yourname
    > cd yourname

To get the existing PiGMaP family genotype data:

    > getdata
This will get you a set of 12 "*.gen" files in your current directory, where "*" represent either chromosome numbers or something that tells the nature of the data (e.g. "all.gen" means all markers from 19 chromosomes). In the future, you can always use this command to get the most updated PiGMaP family genotype data (which will override the existing ones). [NOTE: "getdata" is an UNIX ultility developed by Zhiliang Hu and used on the Pig Genome Server only]



3. Merge your data with the existing PiGMaP family genotype data:

For this particular example, we knew it should map to pig chromosome 16, therefore, we are going to analysis the new data only again the chomosome 16 data. The crimap excutable should be already in your path. So just type:
    > crimap new.par merge
where "crimap" is the command; "new.par" is the parameter file you are going to use, and "merge" is the particular crimap option for merging the data. You will be asked for the first input file, the second input file and the output file. Here is an actual run (those in red are the characters you suppose to provide/ type in):

    > crimap new.par merge

    first input file = chr16.gen

    input file = chr16.gen

    1008000 bytes allocated in morecore

    second input file = fatty.data

    input file = fatty.data

    merge

    output file = new.gen

    writing file ...

    > 

Check if the file "new.gen" is in your working directory.



4. Prepare your data set for crimap analysis:

You need to setup your "parameter (.par), data (.dat) and order (.ord) files for your crimap analysis. Here is an actual sample:

    > crimap new.par prepare

    1008000 bytes allocated in morecore
    
    Creating .dat file new.dat from .gen file new.gen
    
    family id 1
    family id 2
    family id 3
    family id 4
    family id 8
    family id 5
    
    Writing file new.dat
    
    Finished writing new.dat
    
    Writing locus names to new.loc
    
    Current values for parameters:
    
    par_file = new.par
    dat_file = new.dat
    gen_file = new.gen
    ord_file = new.ord
    nb_our_alloc = 3000000    [# bytes reserved for our_alloc]  
    SEX_EQ = 1   [0 = sex specific analysis, 1 = sex equal]
    TOL = 0.010000
    PUK_NUM_ORDERS_TOL = 6
    PK_NUM_ORDERS_TOL = 8
    PUK_LIKE_TOL = 3.000
    PK_LIKE_TOL = 3.000
    use_ord_file = 0
    write_ord_file = 1
    use_haps = 1
    
    Do you wish to change any of these values? (y/n) n
    
    The loci and their indices are:
    
      0   S0390        1   S0391        2   S0371
      3   S0363        4   S0006        5   S0077
      6   GHR-1        7   UTAP2        8   C9
      9   FSA         10   CART        11   S0298
     12   S0026       13   S0061       14   S0105
     15   S0326       16   fatty
    
    Do you wish to enter any new haplotyped systems? (y/n) n
    Do you wish to hold any additional recombination fractions
    fixed (NB these will only be used with the options FIXED
    and CHROMPIC, and only when the loci in question
    are adjacent)? (y/n) n
    
    The crimap options are:
    
    [1] build  [2] instant  [3] quick  [4] fixed
    
    [5] flips  [6] all  [7] twopoint  [8] chrompic
    
    Enter the number of the option you will be running next: 7
    The loci and their indices are:
    
      0   S0390        1   S0391        2   S0371
      3   S0363        4   S0006        5   S0077
      6   GHR-1        7   UTAP2        8   C9
      9   FSA         10   CART        11   S0298
     12   S0026       13   S0061       14   S0105
     15   S0326       16   fatty
    
    Do you wish to compute LOD tables for ALL pairs of loci?
                                (y/n) y
    OK to set up new parameter file? (y/n) y

    new.par has been created; use text editor for further
    modifications, if needed

As a result, now you should have 4 new files in your working directory:

  -rw-r--r--   1 hu    adm     18351 Jan 8 10:45 new.dat
  -rw-r--r--   1 hu    adm     13296 Jan 8 10:35 new.gen
  -rw-r--r--   1 hu    adm       870 Jan 8 10:45 new.loc
  -rw-r--r--   1 hu    adm       315 Jan 8 10:49 new.par


5. Two point linkage analysis:

Two point linkage analysis is to calculate the LOD scores by comparing two markers, the new marker and one of the existing marker, at a time, and output the significant LOD scores (LOD > 3.0) only.

For "twopoint" linkage analysis, you have to modify the ".par" file to include the marker you want to analyse as the inserted_loci (the line in red are the new line you type in):

    dat_file  new.dat *
    gen_file  new.gen *
    ord_file  new.ord *
    nb_our_alloc  3000000 *
    SEX_EQ  1 *
    TOL  0.010000 *
    PUK_NUM_ORDERS_TOL  6 *
    PK_NUM_ORDERS_TOL  8 *
    PUK_LIKE_TOL  3.000 *
    PK_LIKE_TOL  3.000 *
    use_ord_file  0 *
    write_ord_file  1 *
    use_haps  1 *
    ordered_loci 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  *
    inserted_loci 16 *
    END
where the number "16" is corresponding to locus "fatty" and this information can be found in the "new.loc" file.

Now you can do the twopoint linkage analysis:

    > crimap new.par twopoint

    100800 bytes allocated in orders_morecore
    
    3024000 bytes allocated in morecore
    
    Option chosen: twopoint
    
    Current values for parameters:
    
    par_file = new.par
    dat_file = new.dat
    gen_file = new.gen
    ord_file = new.ord
    nb_our_alloc = 3000000    [# bytes reserved for our_alloc]
    SEX_EQ = 1   [0 = sex specific analysis, 1 = sex equal]
    TOL = 0.010000
    PUK_NUM_ORDERS_TOL = 6
    PK_NUM_ORDERS_TOL = 8
    PUK_LIKE_TOL = 3.000
    PK_LIKE_TOL = 3.000
    use_ord_file = 0
    write_ord_file = 0
    use_haps = 1
    
    S0390    S0391    S0371    S0363      S0006
    S0077    GHR-1    UTAP2    C9         FSA
    CART     S0298    S0026    S0061      S0105
    S0326    fatty
    
    AGAINST:      fatty

    S0390   fatty  rec. fracs.=   0.12,   lods =   5.59
     -2.67  2.17   4.97  5.57  5.50  5.12  4.55  3.83  2.99
     2.02   0.96   0.00
    
    S0371   fatty  rec. fracs.=   0.02,   lods =  10.27
      9.32 10.15  10.09  9.39  8.51  7.53  6.46  5.31  4.06
      2.72  1.28   0.00
    
    S0363   fatty  rec. fracs.=   0.03,   lods =   7.39
      6.32  7.19   7.32  6.87  6.26  5.56  4.78  3.93  3.00
      2.01  0.98   0.00
    
    S0077   fatty  rec. fracs.=   0.00,   lods =   3.91
      3.91  3.85   3.58  3.24  2.88  2.51  2.13  1.74  1.33
      0.91  0.46   0.00
    
    CART   fatty  rec. fracs.=   0.00,   lods =  15.35
     15.33 15.12  14.15 12.89 11.57 10.17  8.70  7.13  5.47
      3.69  1.79   0.00
    
    S0298   fatty  rec. fracs.=   0.00,   lods =   3.01
      3.01  2.97   2.79  2.55  2.30  2.04  1.76  1.46  1.14
      0.79  0.41   0.00
    
    fatty   fatty  rec. fracs.=   0.00,   lods =  15.35
     15.33 15.12  14.15 12.89 11.57 10.17  8.70  7.13  5.47
      3.69  1.79   0.00

You can also save the output into a file, say "fatty.2pt", by following syntax:

     > crimap new.par twopoint > fatty.2pt
Use following command to extract the wanted information:
     > get2pt fatty.2pt
     S0390     lods  =  5.59
     S0371     lods  =  10.27
     S0363     lods  =  7.39
     S0077     lods  =  3.91
     CART      lods  =  15.35
     S0298     lods  =  3.01
     fatty     lods  =  15.35



6. Multipoint linkage analysis:

Once you find the significant linkage with markers on a chromosome, the next step is to determine the linear marker order of the linked markers on the chromosome. Multipoint linkage analysis is to calculate the likelihood of an order by weighing the closeness of linkage of the marker in question against all existing markers.

There are a few options invloved in determining the correct marker order. We will introduce a simple approach. In practice you have to choose among the approaches that fits the situation the best.

Assuming the existing marker is in a "correct" order: use option all.

Edit the new.par file so that the marker to exam is NOT in the "ordered_loci":

    dat_file  new.dat *
    gen_file  new.gen *
    ord_file  new.ord *
    nb_our_alloc  3000000 *
    SEX_EQ  1 *
    TOL  0.010000 *
    PUK_NUM_ORDERS_TOL  6 *
    PK_NUM_ORDERS_TOL  8 *
    PUK_LIKE_TOL  3.000 *
    PK_LIKE_TOL  3.000 *
    use_ord_file  0 *
    write_ord_file  1 *
    use_haps  1 *
    ordered_loci 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 *
    inserted_loci 16 *
    END
Then run a TESTING multipoint linkage analysis:

    > crimap new.par all 

    100800 bytes allocated in orders_morecore
    
    3024000 bytes allocated in morecore
    
    Option chosen: all
    
    Current values for parameters:
    
    par_file = new.par
    dat_file = new.dat
    gen_file = new.gen
    ord_file = new.ord
    nb_our_alloc = 3000000    [# bytes reserved for our_alloc]
    SEX_EQ = 1   [0 = sex specific analysis, 1 = sex equal]
    TOL = 0.010000
    PUK_NUM_ORDERS_TOL = 6
    PK_NUM_ORDERS_TOL = 8
    PUK_LIKE_TOL = 3.000
    PK_LIKE_TOL = 3.000
    use_ord_file = 0
    write_ord_file = 0
    use_haps = 1
    
      0   S0390
      1   S0391
      2   S0371
      3   S0363
      4   S0006
      5   S0077
      6   GHR-1
      7   UTAP2
      8   C9
      9   FSA
     10   CART
     11   S0298
     12   S0026
     13   S0061
     14   S0105
     15   S0326
     16   fatty
    
    ordered loci:
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    
    inserted loci:
    16
    
     0  1  2  3  4  5  6  7  8  9 16 10 11 12 13 14 15  -164.425
     0  1  2  3  4  5  6  7  8  9 10 16 11 12 13 14 15  -164.425
     0  1  2  3  4  5  6  7 16  8  9 10 11 12 13 14 15  -164.448
     0  1  2  3  4  5  6  7  8  9 10 11 16 12 13 14 15  -164.448
     0  1  2  3  4  5  6  7  8 16  9 10 11 12 13 14 15  -164.457
     0  1  2  3  4  5  6 16  7  8  9 10 11 12 13 14 15  -164.659
     0  1  2  3  4  5 16  6  7  8  9 10 11 12 13 14 15  -165.161
     0  1  2 16  3  4  5  6  7  8  9 10 11 12 13 14 15  -166.001
     0  1  2  3  4 16  5  6  7  8  9 10 11 12 13 14 15  -166.049    

The last section of the multipoint linkage analysis output shows the best fitting order of the "new" marker position in the existing orderred markers, where the last field gives the likelihood score of each possible order with the highest likelihood on the top.

If the existing marker is not in a right order, you have to run "flip" option to get the best order before you run an "all" option. Of course you may also like to run "flip" options with the new marker inserted into the "ordered_loci" first. It is just a preference or choice of approaches to reduce the number of unnecessary runs before you reach the optimium marker order.

To find out possible unfit marker orders in an existing marker order array, use option flip.

     > crimap new.par flips2
where flips2 means to do flips with two markers at a time (the more of the marker numbers you choose the long the time it takes to run one crimap session, while it may help to reduce the number of flip runs before you find the best order).

     > crimap new.par flips2
     
     (The top portion of the output is similar to that of "all"
      option, therefore we omit it from here)

     number of loci to flip = 2
     
     Original order, & its log10_likelihood,  followed by
     flipped orders, with their relative log10_likelihoods
     (= log10_like[orig] - log10_like[curr])
     
       0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15   -164.43
     
       1  0  -  -  -  -  -  -  -  -  -  -  -  -  -  -      0.00
       -  2  1  -  -  -  -  -  -  -  -  -  -  -  -  -     -0.00
       -  -  3  2  -  -  -  -  -  -  -  -  -  -  -  -     11.10
       -  -  -  4  3  -  -  -  -  -  -  -  -  -  -  -     -1.94
       -  -  -  -  5  4  -  -  -  -  -  -  -  -  -  -     -0.87
       -  -  -  -  -  6  5  -  -  -  -  -  -  -  -  -      0.01
       -  -  -  -  -  -  7  6  -  -  -  -  -  -  -  -      0.84
       -  -  -  -  -  -  -  8  7  -  -  -  -  -  -  -     -0.01
       -  -  -  -  -  -  -  -  9  8  -  -  -  -  -  -      0.00
       -  -  -  -  -  -  -  -  - 10  9  -  -  -  -  -      0.04
       -  -  -  -  -  -  -  -  -  - 11 10  -  -  -  -      0.02
       -  -  -  -  -  -  -  -  -  -  - 12 11  -  -  -     11.74
       -  -  -  -  -  -  -  -  -  -  -  - 13 12  -  -     14.08
       -  -  -  -  -  -  -  -  -  -  -  -  - 14 13  -     -0.00
       -  -  -  -  -  -  -  -  -  -  -  -  -  - 15 14      2.01     

The last field of the results shows the difference of the likelihoods between the original and the flipped orders of two adjecent markers, therefore we want it to be positive. Any negative value indicates that the flipped order is better.

Here you go! By now you may have figured out that you need to run a second flip after you edit the new.par file to change the ordered_loci ..... in this way you repeat the flip runs until all likelihood value become positive.

By repeating the flip and all options, you will get the best order.



7. Determine the marker distances on a linear map:

The option to do crimap for the determination of the marker distances is fixed. (Supposedly you have done through all flip and all games and with the new marker in the "ordered_loci" you have determined the best map order).

     > crimap new.par fixed
     
     (The top portion of the output is similar to that of "all"
      option, therefore we omit it from here)

    Sex_averaged map (recomb. frac., Kosambi cM):
    
      0   S0390              0.0
                  0.05    5.1
      2   S0371              5.1
                  0.04    4.1
      1   S0391              9.2
                  0.04    4.1
      3   S0363             13.3
                  0.03    3.0
      5   S0077             16.4
                  0.03    3.3
      7   UTAP2             19.7
                  0.07    7.5
      6   GHR-1             27.1
                  0.09    9.0
      4   S0006             36.1
                  0.16   17.0
      8   C9                53.1
                  0.00    0.0
     10   CART              53.1
                  0.00    0.0
      9   FSA               53.1
                  0.00    0.0
     11   S0298             53.1
                  0.00    0.0
     16   fatty             53.1
                  0.22   23.3
     12   S0026             76.4
                  0.34   40.8
     14   S0105            117.2
                  0.00    0.0
     13   S0061            117.2
                  0.03    2.5
     15   S0326            119.8
    
    * denotes recomb. frac. held fixed in this analysis
    
    log10_like = -160.925

The definitions for the columns of data in the above output are (from the left to the right):

  1. Crimap number of the markers;
  2. Marker names;
  3. Recombination frequencies between adjecent markers;
  4. Kosambi map distances (cM) converted from the Recombination frequencies;
  5. Accummulative Kosambi map distances (cM) across the length of the map.
An actual map may be drawn using the map distances obtained from the "fixed" results.

NOTE that there are more options in crimap analysis that we have not covered here. For example, Option chrompic is extremely useful in checking genotype data for potential errors and conflicting results among markers. Option build is useful in gradually building up a map by adding markers one at a time. It is useful when build a map de novo. Option instant works in conjunction with "build" that finds a uniquely ordered set of loci quikly.


8. References and/or further readings:

(1) The Official CRIMAP Homepage
(2) The Authors manual
(3) EMBnet Crimapp Tutorial by David Featherston

9. Acknowledgement:

The author would like to thank Dr. Gary Rohrer who introduced CRIMAP to me, and Dr. Lizhen Wang for indepth discussions on exploring the use of some CRIMAP options, and Dr. Max Rothschild for his support in preparation of this material.

10. Appendix: About the CRIMAP software

          Authors: Phil Green, Kathy Falls, and Steve Crooks
     Descriptions: The software is written in C for constructing multilocus 
                   linkage maps.
Operating systems: UNIX, VMS
      Availablity: Please contact Dr. Phil Green (Univ. of Washington)
                   to get both permission and the source code.
         Download: http://compgen.rutgers.edu/old/multimap/crimap/crimap.source.tar.Z



[an error occurred while processing this directive]