Dealing with ``double markers''

In one of the previous examples (section 1.1), we noted that several markers were identified as ``double markers'' by the mrkdouble procedure. These double markers corresponds to pairs of markers such that the observed genotypes are compatible with the assumption that the 2 markers are the same. There are two cases where such a situation may occur:

To distinguish between these two cases, mrkdouble indicates the 2-points LOD between the pair of potential double markers. If this LOD is small, then the two markers should not be merged in one.

In the mouse.raw dataset, all pairs of double markers have strong LODs. Let us reload that file, perform grouping and select group 10 has we previously did:

CG> dsload Data/mouse.raw
...
CG> group  0.3 3.0
...
CG> mrkselset [groupget 10]
...
If you issue the mrkdouble command, you should see the following:
CG> mrkdouble

Possible double markers:

               L029 = L010            [18.1]
               A079 = M030            [21.7]
               A036 = M034            [21.4]
               M237 = M076            [19.9]
               T035 = L078            [21.4]
Let's merge A079 and M030 together. This can be achieved using the mrkmerge command. The command actually merges the obervation of both markers into one and leaves the other unchanged. The command takes the numerical id of the 2 markers as arguments. If you don't know this numerical id, the command mrkid can be used as in the example below.
CG> mrkselget
41 305 298 276 241 260 207 153 120 115 96 106 83 63 59
CG> mrkmerge [mrkid A079] [mrkid M030]
Markers 305 and 153 merged in 305.
Automatically, the marker 132 has been removed from the list of selected markers:
CG> mrkselget
41 305 298 276 241 260 207 120 115 96 106 83 63 59
The same can be done with each pair of markers detected as double markers. Note that it is possible that after merging two markers, some previously detected double markers become separated (eg. on BC data, one individual, if marker M1 is typed ``1'', marker M3 is type ``0'' and marker M2 is typed ``-'', then M1-M2 and M2-M3 are two pairs of ``double markers'' but once M1 and M2 are merged, then M3 cannot be merged with the (m1-M2) pair). Although this is an unlikely situation, CarthaGene checks this in practice and will check compatibility before merging. In the example, all pairs can be merged without problem.
CG> mrkmerge [mrkid L029] [mrkid L010]
Markers 41 and 59 merged in 41.

CG> mrkmerge [mrkid A036] [mrkid M034]
Markers 276 and 115 merged in 276.

CG> mrkmerge [mrkid M237] [mrkid M076]
Markers 207 and 120 merged in 207.

CG> mrkmerge [mrkid T035] [mrkid L078]
Markers 106 and 83 merged in 106.

CG> mrkselget
41 305 298 276 241 260 207 96 106 63
CG> mrklod2p

             41   305   298   276   241   260   207    96   106    63
           L029  A079  A059  A036  M232  D022  M237  T018  T035  L001
          ------------------------------------------------------------
    L029 |------  4.4   5.5   1.1   4.4   6.3   2.0   3.6  13.6   5.9
    A079 |  4.4 ------ 18.4   7.4  16.5   5.7  10.5  14.0   7.2  16.8
    A059 |  5.5  18.4 ------  6.2  14.2   6.4   9.0  11.9   8.8  19.9
    A036 |  1.1   7.4   6.2 ------  9.0   2.6  13.0   8.6   2.6   6.6
    M232 |  4.4  16.5  14.2   9.0 ------  4.8  13.0  17.8   7.4  15.1
    D022 |  6.3   5.7   6.4   2.6   4.8 ------  3.2   4.5   9.6   6.4
    M237 |  2.0  10.5   9.0  13.0  13.0   3.2 ------ 12.8   4.0   9.6
    T018 |  3.6  14.0  11.9   8.6  17.8   4.5  12.8 ------  6.5  12.8
    T035 | 13.6   7.2   8.8   2.6   7.4   9.6   4.0   6.5 ------  9.3
    L001 |  5.9  16.8  19.9   6.6  15.1   6.4   9.6  12.8   9.3 ------

We can now try to build a comprehensive map. In this simple case, we just try the nicemapd command followed by a flips 5 0 0.

CG> nicemapd

Map -1 : log10-likelihood =   -68.30
-------:
 Set : Marker List ...
   4 : L029 T035 D022 L001 A059 A079 M232 T018 M237 A036

CG> flips 5 0 0

Single Flip(window size : 5, threshold : 0.00).


Map -1 : log10-likelihood =   -68.30
-------:
 Set : Marker List ...
   4 : L029 T035 D022 L001 A059 A079 M232 T018 M237 A036

   1 2   2 3 2   2 2
 4 0 6 6 9 0 4 9 0 7  log10
 1 6 0 3 8 5 1 6 7 6    -68.30


CG> heaprintd
...
The best map found can be printed using heaprintd. The best map has number 7. We can later visualize it using the maprintd command:
CG> maprintd 7

Map  7 : log10-likelihood =   -68.30, log-e-likelihood =  -157.26
-------:

Data Set Number  4 :

      Markers        Distance    Cumulative  Distance   Theta       2pt
Pos  Id name         Haldane     Haldane     Kosambi    (%%age)      LOD

--- L010
  1  41 L029           4.0 cM      4.0 cM      3.8 cM     3.8 %%    13.6
--- L078
  2 106 T035           2.5 cM      6.5 cM      2.5 cM     2.5 %%     9.6
  3 260 D022          11.5 cM     18.0 cM     10.4 cM    10.2 %%     6.4
  4  63 L001           1.1 cM     19.1 cM      1.1 cM     1.1 %%    19.9
  5 298 A059           2.2 cM     21.3 cM      2.2 cM     2.2 %%    18.4
--- M030
  6 305 A079           3.4 cM     24.8 cM      3.3 cM     3.3 %%    16.5
  7 241 M232           1.1 cM     25.9 cM      1.1 cM     1.1 %%    17.8
  8  96 T018           4.7 cM     30.5 cM      4.5 cM     4.5 %%    12.8
--- M076
  9 207 M237           5.9 cM     36.5 cM      5.6 cM     5.6 %%    13.0
--- M034
 10 276 A036        ----------              ----------
                      36.5 cM                 34.5 cM


       10 markers, log10-likelihood =   -68.30
                   log-e-likelihood =  -157.26
You can see that the merged markers are indicated on the map. Looking onto the heap for alterated ordering, the situation has much improved with respect to the previous situation (where several orders with identical log-likelihood existed). We can have a look to the situation using the heaprinto command:
CG> heaprinto 3 0 0
Loci Id  ..........   1 2   2 3 2   2 2
                  : 4 0 6 6 9 0 4 9 0 7
                  : 1 6 0 3 8 5 1 6 7 6
Loci Pos .......... | | | | | | | | | |
Map Id : log10    : | | | | | | | | | |  (Delta lod per set)
     7 :   -68.30 : 1 2 3 4 5 6 7 8 9 10  (        4 )
    13 :     2.03 =   2 1                (     2.03 )
     8 :     2.09 =       4 3            (     2.09 )
    10 :     2.37 =             7 6      (     2.37 )
     9 :     2.54 = 2 0 1                (     2.53 )
    12 :     2.88 = 1 0                  (     2.87 )
     4 :     2.95 = 2   0                (     2.95 )
    14 :     3.49 =       5   3          (     3.49 )
    11 :     4.08 = 1 2 0                (     4.07 )
     5 :     4.11 =   2 1 4 3            (     4.11 )
     6 :     4.37 =                 9 8  (     4.37 )
     2 :     4.47 =       4 3   7 6      (     4.46 )
     3 :     4.62 = 2 0 1 4 3            (     4.61 )
     1 :     4.86 = 1 0   4 3            (     4.86 )
     0 :     4.92 = 2   0 4 3            (     4.92 )

CG>

and see that the best alternate map found is at 2 LOD units and is obtained by swapping two markers.

Thomas Schiex 2009-10-27