A CGIAR Generation Challenge Programme project.....Cultivating Genetic Diversity for the Resource Poor
 

 
Home
Sorghum & Millet
Helpdesk
About IRC
Contact Us
Resources for Scientists
Databases
Literature & News
Protocols
Molecular Marker Modules
Tutorials 
Funding Training Visits
Funding for Research
African Molecular Marker Net
ABNETA
Dictionaries, Glossaries
Bioinformatics Resources
Frequently Asked Questions
GCP IP Helpdesk
Teaching Resources
Forum
Find people
Find Lab Products
Favorite Websites
 

Mapmaker Example, T. Fulton, Cornell University
Note: This is an example using the Moacintosh v2, but the basic principles are the same for any version. Please read the manual for the appropriate version for your computer, available here: http://linkage.rockefeller.edu/soft/mapmaker/, http://www.broad.mit.edu/genome_software/ ,or http://www.broad.mit.edu/genome_software/other/qtl.html.

Lander, S., Green, P., Abrahamson, J., Barlow, A., Daly, M., Lincoln, S., and L. Newburg. 1987. MAPMAKER: An interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1, 174 - 181.

This program compares combination of markers and gives the likelihoods of possible sequence orders.  It does NOT tell you the "right" sequence, it will tell the "most likely" order--you must decide what LODs and cM distances you will accept, therefore it can be highly subjective.  For this example I am using real data from an F2 developed from Solanum  lycopersicum  cv. E6203 ( = TA209) (a cultivated variety of tomato) and S.pennellii  (a wild relative of tomato found in Peru).  Our goal in this example is to find possible linkage groups in the marker data generated for this population. You can either just read through the tutorial (I have included pictures of the output you would see) or if you have Mapmaker you can download my mapping file F2.2000.txt and try it yourself (for more info and maps of this population, see The Sol Genomics Network).

The file F2.2000.map is a file containing mapping information for this population, in mapmaker required format. Here's an example of the file, which shows data for the first 3 markers in the datafile, out of a total of 500. There are 83 plants in the population. Each plant is given a score of 1,2,3,4,5, or 0 for each marker, where:

1 = homozygous for the esculentum allele
2 = heterozygous (has one allele from each parent)
3 = homozygous for the wild allele
5 = either 1 or 2 but can't be sure, but definitely not a 3
4 = either 2 or 3 but not a 1
0 = missing data

(Note: other conventions are possible; please see your manual)

DATA TYPE F2

83 500 8 1 134520

*TG230     23320221253323232232233253232322232222213242242225301233322223222232223123322222223

*CT276     21120133211322021332231112121332221235240122223213212211313222122321322013321113221

*TG23     22320212121312233111222212222231222122323323132322223322201321312222122223232122222

-->>When scoring data, DO NOT GUESS. When you make a mistake in scoring, it will look like a recombination has taken place.  Missing data is better than wrong data.

The commands you will need to use to work through this example are in bold, the Mapmaker output is in italics.

Open (double click on) Mapmaker.

Under the "File" menu select Open.  Open the file F2.2000.txt. In the mapm window you should read:

loaded file 'F2.2000.map' as F2 intercross data

(83 individuals, 500 markers)

Using F2 INTERCROSS

->

The most used command is "sequence" or "seq"--this tells the program which markers you are interested in at the moment (they do not need to be an actual sequence).  Mapmaker gives each marker in the data file its own number, it does not work with "TG230" etc.  If at any point you want to see the real name of the marker, give the "sequence" of those markers, ie. seq 1 2 3, then translate or tra.

-> seq 1 2 3

The current sequence is :

     1 2 3

-> tra

     1  TG230            Unnamed    

     2  CT276            Unnamed    

     3  TG23             Unnamed

There are500 markers in this data file, which is too many to work with at once (doing all possible orders of all these markers at once would take a long time!).  We can break these down into smaller, possibly linked groups by using the group command.  We can limit the number of markers in the groups by choosing what LOD score we will accept, and the maximum recombination frequency we will allow.

seq *all (to first choose all the markers to include in the grouping)

If we try the default of LOD  3 and recombination freq 0.4 (40 cM):

Linkage criteria: maximum theta: 0.40, minimum LOD score: 3.00

There's still too many markers in each group, so we need to be more stringent. Try:

group 4  0.1   which asks for a minimum LOD of 4 and a maximum rec. freq. of .1 = 10 cM.

You should see the markers now fall into 55 groups, many containing single markers that may not link well to any other markers, and other groups of various sizes.  This command does NOT infer any particular order of these markers, just the probability that all the markers in a group are linked.  To find the order, we can next use compare (or comp).  This command takes all the markers you give in a sequence and compares ALL possible orders.  Since the number of possible orders (possible maps) increases factorially with each additional marker, it's best not to compare more than 4 or 5 markers at once, in the interest of time. Since Group 13 contains 5 markers, we can easily get an order using compare. We need to first give the sequence we're interested in, and in this case we want to "tell the program" that we don't know the order so we'll put the markers within { }.  Please note the bracket type as other brackets have different meanings: [  ] mean markers within are at the same locus (so order doesn't matter) and <  >  mean the order within is known but not the order of the group itself (could be the inverse order). 

So, to take as an example, Group 13 = 16 100 201 318 421

seq  { 16 100 201 318 421}

comp

You should see a list of the "best 20 orders."  Look at the log likelihoods:  the numbers themselves are not important, but the DIFFERENCE between the LOD values are.  The possible orders are sorted from best to worst. Here's the first 4:

order 1:        16 421 201 100 318      log-likelihood:  -93.71
order 2:        16 201 421 100 318      log-likelihood:  -93.71
order 3:        16 421 201 318 100      log-likelihood:  -95.78
order 4:        16 201 421 318 100      log-likelihood:  -95.78

Recall that an LOD of 2 means one event is 100 times more likely, LOD 3 is 1000 times more likely, etc. A general guideline is that an LOD of 2 or 3 is conventionally acceptable.  But our first 2 orders have exactly the same likelihood, meaning that either order is equally as likely. However, if we look at the sequences, we can see that the only difference between the first 2 orders is that the order of 421 and 201 can't be differentiated. The order of the other markers seems clearly to be 16 (either 421 or 201) 100 318. An educated guess would be that 421 and 201 are either at the same locus or tightly linked (with not enough recombinations to create a statistically significant order). We can check this by asking for a recombination difference between the 2 markers, using the map command.

There are 2 mapping options in Mapmaker, the Haldane or Kosambi function. The Kosambi mapping function takes into account the effects of interference; which means that after one recombination event has occurred it is less likely that a second one will occur in adjacent regions, in the same generation. Therefore it is more widely used. We can change the mapping function to Kosambi under the Mapmaker menu, under Options. Now:

Seq 421 201

Map (or just m)

And we can see that, indeed, 421 and 201 are at the same locus according to the data that we have. So we can try our sequence omitting one of them (since they give no new information anyways) and see if now we have a good sequence:

-> seq {16 421 100 318}

The current sequence is :

     { 16 421 100 318 }

-> comp

Sorting orders by likelihood...

Sequence = { 16 421 100 318 }

order 1:        16 421 100 318  log-likelihood:  -79.69
order 2:        16 421 318 100  log-likelihood:  -81.76

Now, order 1 is more than LOD 2 more likely than order 2.

We can doublecheck our order by using ripple.  This command assumes the general order is known but checks other possible orders within each group of 3 markers, moving down the  given sequence.  (Note that you would not want to use ripple for a completely unknown order as it only looks at 3 markers at a time).

seq 16 421 100 318    (omit { } or it will check all triplets of all possible combinations)

rip

The map for the current sequence is:

16    421   100   318  

   7.9   3.1   5.2

log likelihood =  -79.69

Now computing likelihood differences as all

adjacent triples of loci are permuted...

-- 16 100 421-- --421  16 100-- --421 100  16-- --100  16 421-- --100 421  16--

      -2.75          -11.43          -12.59           -8.16           -8.11

--421 318 100-- --100 421 318-- --100 318 421-- --318 421 100-- --318 100 421--

      -2.06           -2.75           -6.53           -3.38           -5.12

See that the other possible orders of each three markers are being compared to the order that we suggested. Here the next most likely order is LOD >2.06 LESS likely, or our order is indeed the best order thus far.  An LOD <2 for some other possible order would mean that the order we chose is not the only good possible order. 

Let's see how our "map" looks so far.

map  (or m)

A map with 20 cM or more between markers might be questionable (remember, we don't know a "sure order" just the most likely) but there are no such gaps here.

That was easy.  But what can we do with bigger groups?

For example, Group 31 has too many markers to easily order:

Group 31 =  53 164 193 223 229 305 334 361 456

But we can find a "framework" sequence of a few markers that we can later add more to.  To find a framework sequence, we can try any of the markers in the group, but we need to find a good order before we can add on other markers.  For example let's compare {53 164 229 305}

Seq {53 164 229 305}

Comp

The current sequence is :

     { 53 164 229 305 }

-> comp

Sorting orders by likelihood...

Sequence = { 53 164 229 305 }

order 1:        53 305 229 164  log-likelihood:  -74.72
order 2:        53 305 164 229  log-likelihood:  -80.26
order 3:        305 53 229 164  log-likelihood:  -81.93

We can see that the best order this time actually is 53 305 229 164, which is nearly LOD>6 more likely than the next best order, so we'll accept that as our new order.  We can ripple and map to doublecheck. (If these 4 didn't give us a good order, we could try another set of 4 or 5 markers, or drop a marker from this group, etc; there's many ways to go about it).

Now that we have this good framework, we can look back at our original Group 31 to see if any more markers can be added on.  The try command will take a given sequence and show the likelihood that a new marker fits into any of the intervals (or at either end) of that sequence.

seq  53 305 229 164

try 193 223 334 361 456

return  (to choose the default option of testing ALL intervals

You should see a table like this:

        RELATIVE LIKELIHOODS:

         193     223     334     361     456

      ----------------------------------------

      |   0.00  -11.85  -18.61    0.00    0.00

53  |

      |  -0.93    0.00  -24.81   -2.69   -0.00

305 |

      | -13.71   -0.07  -18.72  -18.57  -22.78

229 |

      | -21.40  -12.41   -5.24  -24.80  -30.72

164 |

      | -13.54  -14.38    0.00  -15.60  -23.35

inf  |

      | -25.14  -35.70  -29.38  -26.81  -36.79

     -------------------------------------------

BEST  -95.34  -88.39  -94.71  -96.68  -86.10

Our framework sequence is given down the left column. For each marker we asked to try, a probability score is given for each interval (including off the ends) of our sequence. A likelihood of  0.00 for an interval is where the test marker is most likely to fit, and it would be best if all the other possible intervals for that marker were LOD >2  less likely. "Inf" is the probability that a marker is anywhere ELSE but not on this sequence. Looking at this table, it appears that none of these markers place better "in infinity", which is good! Marker 193 places best off the 'top' end of our sequence, but is a little ambiguous because it places almost as well in the next interval down, between 53 and 305. If we look at marker 334, it appears to fit unambiguously off the end of 164 with very strong scores. This implies that a new sequence of 53 305 229 164 334 might be a good one. Let's try a ripple:

-> seq 53 305 229 164 334

The current sequence is :

     53 305 229 164 334

-> rip

Computing 14 maps ...

The map for the current sequence is:

53    305   229   164   334  

   7.1   5.7   5.6   3.1

log likelihood =  -94.71

Now computing likelihood differences as all

adjacent triples of loci are permuted...

-- 53 229 305-- --305  53 229-- --305 229  53-- --229  53 305-- --229 305  53--

      -9.15           -7.20          -15.25          -12.64          -11.57

--305 164 229-- --229 305 164-- --229 164 305-- --164 305 229-- --164 229 305--

     -14.41           -9.15          -22.65          -22.39          -21.48

--229 334 164-- --164 229 334-- --164 334 229-- --334 229 164-- --334 164 229--

      -5.24          -14.41          -10.70          -18.72           -9.83

Indeed, these all look good, and a map looks good too:

So we have a new, longer good sequence, and we can start trying to add in more markers.

And so on...To make a complete map, you would need to keep going with this process until you had a full set of good linkage groups. To add new markers, you do not need to start all over again, but can narrow down what linkage group the new markers are on with commands such as "near" and then just work with one linkage group at a time. There's many other commands you can try too, depending on your preferences.

Special notes:  You can probably see that there is no "right way" to use Mapmaker.  Instead of choosing some markers of Group 31 to compare, we could also have grouped again with more stringent LOD and cM levels.  Or, we could have worked backwards by using the "first order" command to get an order, then pulled off markers that didn't fit well. Etc.! It's a very iterative and somewhat subjective process.

 

 
               
 
Have a Question or a Problem?
Email the helpdesk
 
                     
       

We will try to answer it or find someone who can!
*Looking for a protocol?
*Troubleshooting a technical problem?
*Looking for a collaborator?

More about the helpdesk, including phone, fax, and snailmail info......

Please check our Queries for Help