A Scroll-through Tutorial:

Using CRIMAP to Perform Linkage Analysis
for Genetic Map Constructions

Janunary 8, 1999
(Updated September 8, 2000)

Zhiliang Hu

Table of Contents

The data structure and data preprocessing
Getting your unix account environment ready
Merge your data with the existing PiGMaP family genotype data
Prepare your data set for crimap analysis
Two point linkage analysis
Multipoint linkage analysis
Determine the marker distances on a linear map
References and/or further readings
Acknowledgement
Appendix: About the CRIMAP software

This tutorial is to serve as a complimentary material to the CRIMAP Manual ("Documentation for CRI-MAP, version 2.4 (3/26/90)" by Phil Green, Kathy Falls, and Steve Crooks) for beginners to walk through step-by-step procedures running CRIMAP program. The users are assumed to have already known the basics of the linkage analysis and understand the theory behind. For concepts of linkage and LODs, please refer to this link.

This tutorial is designed using PiGMaP family genotype data as an example. By the end of the tutorial, the users are expected to be able to do linkage analysis for his/her own data set against the existing PiGMaP data for mapping purposes.

1. The data structure and data preprocessing:

The datastructure is defined to have the format:

FamID  ChildreNum
PigID  CrimapCode  DameID  SireID  Sex  Allele 1  Allele 2

An actual example of the data structure:

1 27 (Edinburgh 1) 153 1 0 0 1 1 2 833 2 0 0 0 1 1 856 3 0 0 1 1 1 433 4 0 0 0 1 2 9591 5 2 1 1 2 2 9360 6 4 3 0 1 2 9365 7 4 3 0 2 2 : : : : : : : : : : : : : :

In crimap analysis, CRIMAP uses the "CrimapCode", not the "Individual ID" ("PigID") for analysis. The "PigID" have to be taken out once the genotype is enterred and the actual data crimap working sheet should look like:

1 27 1 0 0 1 1 2 2 0 0 0 1 1 3 0 0 1 1 1 4 0 0 0 1 2 5 2 1 1 2 2 6 4 3 0 1 2 7 4 3 0 2 2 : : : : : : : : : : : :

The reformating of the datasheet is taken care of by a web form and its related CGI program written in Perl (by Zhiliang Hu). To use the CGI program to reformat your data, please fill in the information in the web form and be sure to also put in your email address correctly in the corresponding field in order for you to receive the reformated data in your email.

The mail containing the reformated data you receive may be in the following format:

  FamNumbers MarkerNumbers
  MarkerName(s)

  Fam1  ChildreNum
  PigID  CrimapCode  DameID  SireID  Sex  Allele 1  Allele 2

  Fam2  ChildreNum
  PigID  CrimapCode  DameID  SireID  Sex  Allele 1  Allele 2

As in:

From: webmaster@db.genome.iastate.edu To: hu@db.genome.iastate.edu Cc: kskim@iastate.edu Subject: Genotype Data Enterred for CRIMAP Analysis <----X Cut here before doing "crimap .par merge" X----> 6 1 Fatty 1 27 1 0 0 1 2 2 2 0 0 0 1 2 3 0 0 1 1 1 : : : : : : 234 223 119 0 0 0 235 223 119 0 0 0 236 223 119 0 0 0 <----X Cut here before doing "crimap .par merge" X----> This data submission was made from timon.ansci.iastate.edu (ip=129.186.111.165) on Mon Jan 11 22:09:05 CST 1999 by Kwan Suk Kim.

You have to cut off the mail header and trailer on (include) the line that says

<----X cut here ... ---->

and save the data into a file, say, "fatty.data", for further analysis.

2. Getting your unix account environment ready:

Use Secured Shell (ssh) to login in your "genome" account:

ssh db.genome.iastate.edu

Note: Obtain your login and passwd from Zhiliang Hu.

Your unix account environment is customized to use "tcsh". If you find it is not the case, do a "chsh" to change it to "tcsh".

Creat a sub-directory for yourself and "cd" to your working directory:

    > mkdir yourname
    > cd yourname

To get the existing PiGMaP family genotype data:

    > getdata

This will get you a set of 12 "*.gen" files in your current directory, where "*" represent either chromosome numbers or something that tells the nature of the data (e.g. "all.gen" means all markers from 19 chromosomes). In the future, you can always use this command to get the most updated PiGMaP family genotype data (which will override the existing ones). [NOTE: "getdata" is an UNIX ultility developed by Zhiliang Hu and used on the Pig Genome Server only]

3. Merge your data with the existing PiGMaP family genotype data:

For this particular example, we knew it should map to pig chromosome 16, therefore, we are going to analysis the new data only again the chomosome 16 data. The crimap excutable should be already in your path. So just type:

    > crimap new.par merge

where "crimap" is the command; "new.par" is the parameter file you are going to use, and "merge" is the particular crimap option for merging the data. You will be asked for the first input file, the second input file and the output file. Here is an actual run (those in red are the characters you suppose to provide/ type in):

> crimap new.par merge first input file = chr16.gen input file = chr16.gen 1008000 bytes allocated in morecore second input file = fatty.data input file = fatty.data merge output file = new.gen writing file ... >

Check if the file "new.gen" is in your working directory.

4. Prepare your data set for crimap analysis:

You need to setup your "parameter (.par), data (.dat) and order (.ord) files for your crimap analysis. Here is an actual sample:

> crimap new.par prepare 1008000 bytes allocated in morecore Creating .dat file new.dat from .gen file new.gen family id 1 family id 2 family id 3 family id 4 family id 8 family id 5 Writing file new.dat Finished writing new.dat Writing locus names to new.loc Current values for parameters: par_file = new.par dat_file = new.dat gen_file = new.gen ord_file = new.ord nb_our_alloc = 3000000 [# bytes reserved for our_alloc] SEX_EQ = 1 [0 = sex specific analysis, 1 = sex equal] TOL = 0.010000 PUK_NUM_ORDERS_TOL = 6 PK_NUM_ORDERS_TOL = 8 PUK_LIKE_TOL = 3.000 PK_LIKE_TOL = 3.000 use_ord_file = 0 write_ord_file = 1 use_haps = 1 Do you wish to change any of these values? (y/n) n The loci and their indices are: 0 S0390 1 S0391 2 S0371 3 S0363 4 S0006 5 S0077 6 GHR-1 7 UTAP2 8 C9 9 FSA 10 CART 11 S0298 12 S0026 13 S0061 14 S0105 15 S0326 16 fatty Do you wish to enter any new haplotyped systems? (y/n) n Do you wish to hold any additional recombination fractions fixed (NB these will only be used with the options FIXED and CHROMPIC, and only when the loci in question are adjacent)? (y/n) n The crimap options are: [1] build [2] instant [3] quick [4] fixed [5] flips [6] all [7] twopoint [8] chrompic Enter the number of the option you will be running next: 7 The loci and their indices are: 0 S0390 1 S0391 2 S0371 3 S0363 4 S0006 5 S0077 6 GHR-1 7 UTAP2 8 C9 9 FSA 10 CART 11 S0298 12 S0026 13 S0061 14 S0105 15 S0326 16 fatty Do you wish to compute LOD tables for ALL pairs of loci? (y/n) y OK to set up new parameter file? (y/n) y new.par has been created; use text editor for further modifications, if needed

As a result, now you should have 4 new files in your working directory:

  -rw-r--r--   1 hu    adm     18351 Jan 8 10:45 new.dat
  -rw-r--r--   1 hu    adm     13296 Jan 8 10:35 new.gen
  -rw-r--r--   1 hu    adm       870 Jan 8 10:45 new.loc
  -rw-r--r--   1 hu    adm       315 Jan 8 10:49 new.par

5. Two point linkage analysis:

Two point linkage analysis is to calculate the LOD scores by comparing two markers, the new marker and one of the existing marker, at a time, and output the significant LOD scores (LOD > 3.0) only.

For "twopoint" linkage analysis, you have to modify the ".par" file to include the marker you want to analyse as the inserted_loci (the line in red are the new line you type in):

    dat_file  new.dat *
    gen_file  new.gen *
    ord_file  new.ord *
    nb_our_alloc  3000000 *
    SEX_EQ  1 *
    TOL  0.010000 *
    PUK_NUM_ORDERS_TOL  6 *
    PK_NUM_ORDERS_TOL  8 *
    PUK_LIKE_TOL  3.000 *
    PK_LIKE_TOL  3.000 *
    use_ord_file  0 *
    write_ord_file  1 *
    use_haps  1 *
    ordered_loci 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16  *
    inserted_loci 16 *
    END

where the number "16" is corresponding to locus "fatty" and this information can be found in the "new.loc" file.

Now you can do the twopoint linkage analysis:

> crimap new.par twopoint 100800 bytes allocated in orders_morecore 3024000 bytes allocated in morecore Option chosen: twopoint Current values for parameters: par_file = new.par dat_file = new.dat gen_file = new.gen ord_file = new.ord nb_our_alloc = 3000000 [# bytes reserved for our_alloc] SEX_EQ = 1 [0 = sex specific analysis, 1 = sex equal] TOL = 0.010000 PUK_NUM_ORDERS_TOL = 6 PK_NUM_ORDERS_TOL = 8 PUK_LIKE_TOL = 3.000 PK_LIKE_TOL = 3.000 use_ord_file = 0 write_ord_file = 0 use_haps = 1 S0390 S0391 S0371 S0363 S0006 S0077 GHR-1 UTAP2 C9 FSA CART S0298 S0026 S0061 S0105 S0326 fatty AGAINST: fatty S0390 fatty rec. fracs.= 0.12, lods = 5.59 -2.67 2.17 4.97 5.57 5.50 5.12 4.55 3.83 2.99 2.02 0.96 0.00 S0371 fatty rec. fracs.= 0.02, lods = 10.27 9.32 10.15 10.09 9.39 8.51 7.53 6.46 5.31 4.06 2.72 1.28 0.00 S0363 fatty rec. fracs.= 0.03, lods = 7.39 6.32 7.19 7.32 6.87 6.26 5.56 4.78 3.93 3.00 2.01 0.98 0.00 S0077 fatty rec. fracs.= 0.00, lods = 3.91 3.91 3.85 3.58 3.24 2.88 2.51 2.13 1.74 1.33 0.91 0.46 0.00 CART fatty rec. fracs.= 0.00, lods = 15.35 15.33 15.12 14.15 12.89 11.57 10.17 8.70 7.13 5.47 3.69 1.79 0.00 S0298 fatty rec. fracs.= 0.00, lods = 3.01 3.01 2.97 2.79 2.55 2.30 2.04 1.76 1.46 1.14 0.79 0.41 0.00 fatty fatty rec. fracs.= 0.00, lods = 15.35 15.33 15.12 14.15 12.89 11.57 10.17 8.70 7.13 5.47 3.69 1.79 0.00

You can also save the output into a file, say "fatty.2pt", by following syntax:

     > crimap new.par twopoint > fatty.2pt

Use following command to extract the wanted information:

     > get2pt fatty.2pt
     S0390     lods  =  5.59
     S0371     lods  =  10.27
     S0363     lods  =  7.39
     S0077     lods  =  3.91
     CART      lods  =  15.35
     S0298     lods  =  3.01
     fatty     lods  =  15.35

6. Multipoint linkage analysis:

Once you find the significant linkage with markers on a chromosome, the next step is to determine the linear marker order of the linked markers on the chromosome. Multipoint linkage analysis is to calculate the likelihood of an order by weighing the closeness of linkage of the marker in question against all existing markers.

There are a few options invloved in determining the correct marker order. We will introduce a simple approach. In practice you have to choose among the approaches that fits the situation the best.

Assuming the existing marker is in a "correct" order: use option all.

Edit the new.par file so that the marker to exam is NOT in the "ordered_loci":

    dat_file  new.dat *
    gen_file  new.gen *
    ord_file  new.ord *
    nb_our_alloc  3000000 *
    SEX_EQ  1 *
    TOL  0.010000 *
    PUK_NUM_ORDERS_TOL  6 *
    PK_NUM_ORDERS_TOL  8 *
    PUK_LIKE_TOL  3.000 *
    PK_LIKE_TOL  3.000 *
    use_ord_file  0 *
    write_ord_file  1 *
    use_haps  1 *
    ordered_loci 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 *
    inserted_loci 16 *
    END

Then run a TESTING multipoint linkage analysis:

> crimap new.par all 100800 bytes allocated in orders_morecore 3024000 bytes allocated in morecore Option chosen: all Current values for parameters: par_file = new.par dat_file = new.dat gen_file = new.gen ord_file = new.ord nb_our_alloc = 3000000 [# bytes reserved for our_alloc] SEX_EQ = 1 [0 = sex specific analysis, 1 = sex equal] TOL = 0.010000 PUK_NUM_ORDERS_TOL = 6 PK_NUM_ORDERS_TOL = 8 PUK_LIKE_TOL = 3.000 PK_LIKE_TOL = 3.000 use_ord_file = 0 write_ord_file = 0 use_haps = 1 0 S0390 1 S0391 2 S0371 3 S0363 4 S0006 5 S0077 6 GHR-1 7 UTAP2 8 C9 9 FSA 10 CART 11 S0298 12 S0026 13 S0061 14 S0105 15 S0326 16 fatty ordered loci: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 inserted loci: 16 0 1 2 3 4 5 6 7 8 9 16 10 11 12 13 14 15 -164.425 0 1 2 3 4 5 6 7 8 9 10 16 11 12 13 14 15 -164.425 0 1 2 3 4 5 6 7 16 8 9 10 11 12 13 14 15 -164.448 0 1 2 3 4 5 6 7 8 9 10 11 16 12 13 14 15 -164.448 0 1 2 3 4 5 6 7 8 16 9 10 11 12 13 14 15 -164.457 0 1 2 3 4 5 6 16 7 8 9 10 11 12 13 14 15 -164.659 0 1 2 3 4 5 16 6 7 8 9 10 11 12 13 14 15 -165.161 0 1 2 16 3 4 5 6 7 8 9 10 11 12 13 14 15 -166.001 0 1 2 3 4 16 5 6 7 8 9 10 11 12 13 14 15 -166.049

The last section of the multipoint linkage analysis output shows the best fitting order of the "new" marker position in the existing orderred markers, where the last field gives the likelihood score of each possible order with the highest likelihood on the top.

If the existing marker is not in a right order, you have to run "flip" option to get the best order before you run an "all" option. Of course you may also like to run "flip" options with the new marker inserted into the "ordered_loci" first. It is just a preference or choice of approaches to reduce the number of unnecessary runs before you reach the optimium marker order.

To find out possible unfit marker orders in an existing marker order array, use option flip.

     > crimap new.par flips2

where flips2 means to do flips with two markers at a time (the more of the marker numbers you choose the long the time it takes to run one crimap session, while it may help to reduce the number of flip runs before you find the best order).

> crimap new.par flips2 (The top portion of the output is similar to that of "all" option, therefore we omit it from here) number of loci to flip = 2 Original order, & its log10_likelihood, followed by flipped orders, with their relative log10_likelihoods (= log10_like[orig] - log10_like[curr]) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 -164.43 1 0 - - - - - - - - - - - - - - 0.00 - 2 1 - - - - - - - - - - - - - -0.00 - - 3 2 - - - - - - - - - - - - 11.10 - - - 4 3 - - - - - - - - - - - -1.94 - - - - 5 4 - - - - - - - - - - -0.87 - - - - - 6 5 - - - - - - - - - 0.01 - - - - - - 7 6 - - - - - - - - 0.84 - - - - - - - 8 7 - - - - - - - -0.01 - - - - - - - - 9 8 - - - - - - 0.00 - - - - - - - - - 10 9 - - - - - 0.04 - - - - - - - - - - 11 10 - - - - 0.02 - - - - - - - - - - - 12 11 - - - 11.74 - - - - - - - - - - - - 13 12 - - 14.08 - - - - - - - - - - - - - 14 13 - -0.00 - - - - - - - - - - - - - - 15 14 2.01

The last field of the results shows the difference of the likelihoods between the original and the flipped orders of two adjecent markers, therefore we want it to be positive. Any negative value indicates that the flipped order is better.

Here you go! By now you may have figured out that you need to run a second flip after you edit the new.par file to change the ordered_loci ..... in this way you repeat the flip runs until all likelihood value become positive.

By repeating the flip and all options, you will get the best order.

7. Determine the marker distances on a linear map:

The option to do crimap for the determination of the marker distances is fixed. (Supposedly you have done through all flip and all games and with the new marker in the "ordered_loci" you have determined the best map order).

> crimap new.par fixed (The top portion of the output is similar to that of "all" option, therefore we omit it from here) Sex_averaged map (recomb. frac., Kosambi cM): 0 S0390 0.0 0.05 5.1 2 S0371 5.1 0.04 4.1 1 S0391 9.2 0.04 4.1 3 S0363 13.3 0.03 3.0 5 S0077 16.4 0.03 3.3 7 UTAP2 19.7 0.07 7.5 6 GHR-1 27.1 0.09 9.0 4 S0006 36.1 0.16 17.0 8 C9 53.1 0.00 0.0 10 CART 53.1 0.00 0.0 9 FSA 53.1 0.00 0.0 11 S0298 53.1 0.00 0.0 16 fatty 53.1 0.22 23.3 12 S0026 76.4 0.34 40.8 14 S0105 117.2 0.00 0.0 13 S0061 117.2 0.03 2.5 15 S0326 119.8 * denotes recomb. frac. held fixed in this analysis log10_like = -160.925

The definitions for the columns of data in the above output are (from the left to the right):

Crimap number of the markers;
Marker names;
Recombination frequencies between adjecent markers;
Kosambi map distances (cM) converted from the Recombination frequencies;
Accummulative Kosambi map distances (cM) across the length of the map.

An actual map may be drawn using the map distances obtained from the "fixed" results.

NOTE that there are more options in crimap analysis that we have not covered here. For example, Option chrompic is extremely useful in checking genotype data for potential errors and conflicting results among markers. Option build is useful in gradually building up a map by adding markers one at a time. It is useful when build a map de novo. Option instant works in conjunction with "build" that finds a uniquely ordered set of loci quikly.

          Authors: Phil Green, Kathy Falls, and Steve Crooks
     Descriptions: The software is written in C for constructing multilocus 
                   linkage maps.
Operating systems: UNIX, VMS
      Availablity: Please contact Dr. Phil Green (Univ. of Washington)
                   to get both permission and the source code.
         Download: http://compgen.rutgers.edu/old/multimap/crimap/crimap.source.tar.Z

[an error occurred while processing this directive]

A Scroll-through Tutorial:

Using CRIMAP to Perform Linkage Analysis
for Genetic Map Constructions

Table of Contents

1. The data structure and data preprocessing:

2. Getting your unix account environment ready:

3. Merge your data with the existing PiGMaP family genotype data:

4. Prepare your data set for crimap analysis:

5. Two point linkage analysis:

6. Multipoint linkage analysis:

7. Determine the marker distances on a linear map:

8. References and/or further readings:

9. Acknowledgement:

10. Appendix: About the CRIMAP software

A Scroll-through Tutorial:

Using CRIMAP to Perform Linkage Analysis for Genetic Map Constructions

Table of Contents

Using CRIMAP to Perform Linkage Analysis
for Genetic Map Constructions