![]() |
A
CGIAR Generation Challenge Programme project.....Cultivating Genetic Diversity
for the Resource Poor |
![]() |
||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||
![]() |
||||||||||||||||||||||||||
Mapmaker
Example, T. Fulton, Cornell University Lander, S.,
Green, P., Abrahamson, J., Barlow, A., Daly, M., Lincoln, S., and L. Newburg.
1987. MAPMAKER: An interactive computer package for constructing primary
genetic linkage maps of experimental and natural populations. Genomics
1, 174 - 181. This program
compares combination of markers and gives the likelihoods of possible
sequence orders. It does NOT tell you the "right"
sequence, it will tell the "most likely" order--you must decide
what LODs and cM distances you will accept, therefore it can be highly
subjective. For this example
I am using real data from an F2 developed from Solanum lycopersicum cv.
E6203 ( = TA209) (a cultivated variety of tomato) and S.pennellii (a wild relative of tomato found in Peru).
Our goal in this example is to find possible linkage groups in
the marker data generated for this population. You can either just read through the tutorial (I have included pictures
of the output you would see) or if you have Mapmaker you can download
my mapping file The file
F2.2000.map is a file containing mapping information for this population,
in mapmaker required format. Here's an example of the file, which shows
data for the first 3 markers in the datafile, out of a total of 500. There
are 83 plants in the population. Each plant is given a score of 1,2,3,4,5,
or 0 for each marker, where: 1 = homozygous
for the esculentum allele (Note:
other conventions are possible; please see your manual) DATA TYPE F2 83 500 8 1 134520 *TG230
23320221253323232232233253232322232222213242242225301233322223222232223123322222223 *CT276
21120133211322021332231112121332221235240122223213212211313222122321322013321113221 *TG23
22320212121312233111222212222231222122323323132322223322201321312222122223232122222 -->>When
scoring data, DO NOT GUESS. When you make a mistake in scoring, it will
look like a recombination has taken place.
Missing data is better than wrong data. The commands
you will need to use to work through this example are in bold,
the Mapmaker output is in italics. Open (double
click on) Mapmaker. Under the
"File" menu select Open. Open the file F2.2000.txt. In the mapm window you should read: loaded
file 'F2.2000.map' as F2 intercross data (83 individuals,
500 markers) Using
F2 INTERCROSS -> The most
used command is "sequence" or "seq"--this tells the
program which markers you are interested in at the moment (they do not
need to be an actual sequence).
Mapmaker gives each marker in the data file its own number, it
does not work with "TG230" etc.
If at any point you want to see the real name of the marker, give
the "sequence" of those markers, ie. seq 1 2 3, then translate or tra. ->
seq 1 2 3 The current
sequence is : 1 2 3 ->
tra 1 TG230 Unnamed
2 CT276 Unnamed
3 TG23
Unnamed There are500
markers in this data file, which is too many to work with at once (doing
all possible orders of all these markers at once would take a long time!). We can break these down into smaller,
possibly linked groups by using the group command. We can limit the number of markers in
the groups by choosing what LOD score we will accept, and the maximum
recombination frequency we will allow. seq *all (to first choose all the markers to include
in the grouping) If we try
the default of LOD 3 and
recombination freq 0.4 (40 cM): Linkage
criteria: maximum theta: 0.40, minimum LOD score: 3.00 There's still
too many markers in each group, so we need to be more stringent. Try: group
4 0.1 which asks
for a minimum LOD of 4 and a maximum rec. freq. of .1 = 10 cM. You should
see the markers now fall into 55 groups, many containing single markers
that may not link well to any other markers, and other groups of various
sizes. This command does NOT infer any particular
order of these markers, just the probability that all the markers in a
group are linked. To find
the order, we can next use compare (or comp). This command
takes all the markers you give in a sequence and compares ALL possible
orders. Since the number
of possible orders (possible maps) increases factorially with each additional
marker, it's best not to compare more than 4 or 5 markers at once, in
the interest of time. Since Group 13 contains 5 markers, we can easily
get an order using compare.
We need to first give the sequence we're interested in, and in this case
we want to "tell the program" that we don't know the order so
we'll put the markers within { }. Please note the bracket type as other
brackets have different meanings: [
] mean markers
within are at the same locus (so order doesn't matter) and < > mean the order
within is known but not the order of the group itself (could be the inverse
order). So, to take
as an example, Group 13 seq { 16 100 201 318 421} comp You should
see a list of the "best 20 orders."
Look at the log likelihoods:
the numbers themselves are not important, but the DIFFERENCE between
the LOD values are. The possible
orders are sorted from best to worst. Here's the first 4: order
1: 16
421 201 100 318
log-likelihood: -93.71 Recall that
an LOD of 2 means one event is 100 times more likely, LOD 3 is 1000 times
more likely, etc. A general guideline is that an LOD of 2 or 3 is conventionally
acceptable. But our first 2 orders have exactly the
same likelihood, meaning that either order is equally as likely. However,
if we look at the sequences, we can see that the only difference between
the first 2 orders is that the order of 421 and 201 can't be differentiated.
The order of the other markers seems clearly to be 16 (either 421 or 201)
100 318. An educated guess would be that 421 and 201 are either at the
same locus or tightly linked (with not enough recombinations to create
a statistically significant order). We can check this by asking for a
recombination difference between the 2 markers, using the map command. There are
2 mapping options in Mapmaker, the Haldane or Kosambi function. The Kosambi
mapping function takes into account the effects of interference; which
means that after one recombination event has occurred it is less likely
that a second one will occur in adjacent regions, in the same generation.
Therefore it is more widely used. We can change the mapping function to
Kosambi under the Mapmaker menu, under Options. Now: Seq 421
201 Map (or just m) And we can
see that, indeed, 421 and 201 are at the same locus according to the data
that we have. So we can try our sequence omitting one of them (since they
give no new information anyways) and see if now we have a good sequence: ->
seq {16 421 100 318} The current
sequence is : { 16 421 100 318 } ->
comp Sorting
orders by likelihood... Sequence
= { 16 421 100 318 } order
1: 16
421 100 318 log-likelihood: -79.69 Now, order
1 is more than LOD 2 more likely than order 2. We can doublecheck
our order by using ripple. This command assumes
the general order is known but checks other possible orders within each
group of 3 markers, moving down the
given sequence. (Note
that you would not want to use ripple for a completely unknown order as
it only looks at 3 markers at a time). seq 16
421 100 318 (omit { } or it will check all triplets of all possible combinations) rip The map
for the current sequence is: 16 421 100 318 7.9 3.1 5.2 log likelihood
= -79.69 Now computing
likelihood differences as all adjacent
triples of loci are permuted... -- 16
100 421-- --421 16 100--
--421 100 16-- --100 16 421-- --100 421 16-- -2.75 -11.43 -12.59
-8.16 -8.11 --421
318 100-- --100 421 318-- --100 318 421-- --318 421 100-- --318 100 421-- -2.06 -2.75 -6.53 -3.38 -5.12 See that
the other possible orders of each three markers are being compared to
the order that we suggested. Here the next most likely order is LOD >2.06
LESS likely, or our order is indeed the best order thus far. An LOD <2 for some other possible order would mean that
the order we chose is not the only good possible order. Let's see
how our "map" looks so far. map (or
m)
A map with
20 cM or more between markers might be questionable (remember, we don't
know a "sure order" just the most likely) but there are no such
gaps here. That was
easy. But what can we do
with bigger groups? For example,
Group 31 has too many markers to easily order: But we can
find a "framework" sequence of a few markers that we can later
add more to. To find a framework
sequence, we can try any of the markers in the group, but we need to find
a good order before we can add on other markers.
For example let's compare
{53 164 229 305} Seq {53
164 229 305} Comp The current
sequence is : { 53 164 229 305 } ->
comp Sorting
orders by likelihood... Sequence
= { 53 164 229 305 } order
1: 53
305 229 164 log-likelihood: -74.72 We can see
that the best order this time actually is 53 305 229 164, which is nearly
LOD>6 more likely than the next best order, so we'll accept that as
our new order. We can ripple
and map to doublecheck. (If these 4 didn't give
us a good order, we could try another set of 4 or 5 markers, or drop a
marker from this group, etc; there's many ways to go about it). Now that
we have this good framework, we can look back at our original Group 31
to see if any more markers can be added on.
The try command will take a given sequence and
show the likelihood that a new marker fits into any of the intervals (or
at either end) of that sequence. seq 53 305 229 164 try 193
223 334 361 456 return (to
choose the default option of testing ALL intervals You should
see a table like this: RELATIVE
LIKELIHOODS: 193
223 334 361 456 ---------------------------------------- | 0.00 -11.85 -18.61 0.00 0.00 53 | | -0.93 0.00 -24.81
-2.69 -0.00 305 | | -13.71 -0.07 -18.72
-18.57 -22.78 229 | | -21.40 -12.41 -5.24 -24.80
-30.72 164 | | -13.54 -14.38 0.00 -15.60
-23.35 inf | | -25.14 -35.70 -29.38 -26.81 -36.79 ------------------------------------------- BEST -95.34 -88.39 -94.71
-96.68 -86.10 Our framework
sequence is given down the left column. For each marker we asked to try,
a probability score is given for each interval (including off the ends)
of our sequence. A likelihood of
0.00 for an interval is where the test marker is most likely to
fit, and it would be best if all the other possible intervals for that
marker were LOD >2 less likely. "Inf" is the probability
that a marker is anywhere ELSE but not on this sequence. Looking at this
table, it appears that none of these markers place better "in infinity",
which is good! Marker 193 places best off the 'top' end of our sequence,
but is a little ambiguous because it places almost as well in the next
interval down, between 53 and 305. If we look at marker 334, it appears
to fit unambiguously off the end of 164 with very strong scores. This
implies that a new sequence of 53 305 229 164 334 might be a good one.
Let's try a ripple: ->
seq 53 305 229 164 334 The current
sequence is : 53 305 229 164 334 ->
rip Computing
14 maps ... The map
for the current sequence is: 53 305 229 164 334
7.1 5.7 5.6
3.1 log likelihood
= -94.71 Now computing
likelihood differences as all adjacent
triples of loci are permuted... -- 53
229 305-- --305 53 229--
--305 229 53-- --229 53 305-- --229 305 53-- -9.15 -7.20 -15.25 -12.64 -11.57 --305
164 229-- --229 305 164-- --229 164 305-- --164 305 229-- --164 229 305-- -14.41 -9.15
-22.65 -22.39
-21.48 --229
334 164-- --164 229 334-- --164 334 229-- --334 229 164-- --334 164 229-- -5.24 -14.41
-10.70 -18.72 -9.83 Indeed, these
all look good, and a map looks good too:
So we have
a new, longer good sequence, and we can start trying to add in more markers.
And so on... Special notes:
You can probably see that there is no "right way" to
use Mapmaker. Instead of choosing some markers of Group 31 to compare, we
could also have grouped again with more stringent LOD and cM levels. Or, we could have worked backwards by
using the "first order" command to get an order, then pulled
off markers that didn't fit well. Etc.! It's a very iterative and somewhat
subjective process.
|
||||||||||||||||||||||||||
Have
a Question or a Problem? Email the helpdesk |
||||||||||||||||||||||||||
We will try to
answer it or find someone who can! More about
the helpdesk, including phone, fax, and snailmail info...... |
||||||||||||||||||||||||||