A CGIAR Generation Challenge Programme project.....Cultivating Genetic Diversity for the Resource Poor
 

 
Home
Sorghum & Millet
Helpdesk
About IRC
Contact Us
Resources for Scientists
Databases
Literature & News
Protocols
Molecular Marker Modules
Tutorials 
Funding Training Visits
Funding for Research
African Molecular Marker Net
ABNETA
Dictionaries, Glossaries
Bioinformatics Resources
Frequently Asked Questions
GCP IP Helpdesk
Teaching Resources
Forum
Find people
Find Lab Products
Favorite Websites
 

QGene Example, T. Fulton, Cornell University

Nelson JC (1997) QGENE: software for marker-based genomic analysis and breeding. Molecular Breeding 3: 239-245.

See also the user notes and downloadable manual at http://www.qgene.org/

You can either just read through the example (screen shots are shown) or download the data files and the program and follow along.

QGene is a program designed by Clare Nelson.  It is a very user friendly program for QTL analysis, NIL selection, and has many other functions.  It is especially geared for Advanced Backcross populations, for which most current software is either not applicable or user friendly for plant breeders.  QGene requires a data file containing all the marker data and scored trait data and a map file listing all the markers scored. In this exercise we will go through an example of QTL detection and NIL selection.  This population was derived from a cross between S. lycopersicon (TA209), a commercial variety of tomato, and S. pennellii LA1657, a wild relative.  The marker analysis was done on the BC2 population, and BC3 plots of 30 plants each were used in field trials at Israel and 2 locations in California.The results of this study were published in:

Frary A, Fulton TM, Tanksley SD (2004) Advanced backcross QTL analysis of a Lycopersicon esculentum x L. pennellii cross and possible orthologs in the Solanaceae. Theor Appl Genet 108(3):485-96

QGene requires 2 datafiles for QTL analysis. One is a file containing all the marker and trait data for this population. Download here(as a excel file; save as text to open in QGene) or see a smaller example here. Down the left column are all the plants in this population; across the top are the marker and trait names. The marker data is the mapping data for this population, with the same scoring conventions as in our mapping file last week.  The trait abbreviations contain the trait measured, such as yield, fruit weight, fruit color, etc. and the location it was measured at. These plants were scored at several locations, Israel and both Heinz and Sunseeds in California. Therefore "isfw" is fruit weight scored in Israel. An important thing to note about the marker scores vs. the trait scores is the meaning of "0". In mapping scoring, a 0 means missing data. However, in phenotypic scoring, there can be such a thing as an actual 0 score, as opposed to missing data. For example, 0 yield, no yield, is different than a yield score that's missing (it means that plant didn't yield anything, which is bad!). Thus in the trait data you will see that missing data is scored as "-" to differentiate it from scores of 0.

The second file QGene requires is a file with mapping information. Download the Word file here or see an example here This file tells QGene which markers were mapped, on which chromosome they are located, and how far apart they are. This is mainly for getting a more accurate picture of where your putative QTL are located.

Open QGene by double-clicking on it.  Under the "File" menu, choose Open Population, Open LA1657.bc2.qdata.txt, then qpenn.map.txt.

On the right side of your screen is a list of all the traits scored.

Under the View menu is "help" which you can access at any time for descriptions of functions.  Also under the View menu is "Map" which will show the map of all the markers scored (close this by clicking in the small box at the top left corner).

You can select any trait (or many at once) and look at their scoring patterns by going to the View menu and selecting Trait Histogram. Note that some traits are scored as continuous data and some are scored as ordinal data (for example, color is scored as 1-5 where 5 is the most red). This is a good way to check your data for outliers that might be mis-scored data.

We can also check for skewed segregation by selecting Chi-square segregation under the Analyses window. Here we can see that at the end of chromosome 12 is a region highly skewed toward the wild parent allele, something important to keep in mind for future work such as breaking linkage drag and developing Near-Isogenic lines.

Let's try some of the other functions:

In the trait list at the right, select is.brix (soluble solids scored in Israel) by double clicking on it: it should now have a • in front of it (or be red, in the newer version). (Note: soluble solids are a very important characteristic to the tomato processing industry; more solids means a thicker tomato paste or sauce. Brix is the unit used to measure the solids). 

Under the "Analyses" menu, choose Single Point regression.  The graph depicts the chromosomes as if laid end to end, showing the effects of the 2 parental alleles on yield  (Recall that TA209 is the esculentum recurrent parent, while LA1657 is the wild species donor parent).

Here's the graph you should see:

 

This depicts the chromosomes as lying end to end. Anytime the line goes up it is showing a positive effect from the wild parent allele; down is a negative effect from the wild allele. So the spike on chromosome 12 is where the allele from LA1657, the wild parent, is associated with an increase in brix. If we click on the bump it highlights CT79 as being the marker associated with this QTL.

We can also say Save stats, which saves the data for us in a table format file. Choosing the option 'sort by statistic' is a good idea because the table will contain the regression data for this trait using every single marker in your file, but you really only want to look at the ones that are the most significant. Now you can open the table with Excel.

An example of this table is at the end of your handout, showing the marker, chromosome, source of the increase (an increase in soluble solids in this case), R-squared (amount of the increase), P value, AA = the average value for plants containing the homozygous esculentum alleles, Aa = the value for plants heterozygous for the esc and wild alleles, etc.

You can see that there are quite a few QTLs where the wild allele is associated with an increase in brix, with CT79 being the most significant, but they are all located together on chromosome 12. In this case we wouldn't know whether there are actually several QTLs that are linked or whether its one big QTL unless we did some further work, such as fine mapping.

In the Frary et al paper, you can see that we created a table and a map of all significant QTLs. This was done simply by doing a regression for every trait, just as we did for brix, and compiling all the significant QTL into a table and onto their map location. In this case we also compared the QTLs found in this population with those found in other tomato populations from other wild species so that we could see what was in common, and which might be new (previously undiscovered) QTL. 

Now we might be interested in creating a Near-Isogenic Line (NIL):  a plant completely identical to TA209 except for this small piece of chromosome 12, which would contain DNA introgressed from the wild parent.  First we would want to make sure that this same region of chromosome 12 was not also associated with any negative effects.  For example, often an increase in brix is often associated with a decrease in fruit size or yield which could be very damaging, commercially.

Looking back at the trait list, we can select several traits at once by holding down the Command key and click on is.brix, is.firm, is.totyld, and HW.brix.  All the selected traits should be highlighted (not •). 

Under the View menu, select Multiplot.  (The thresholds etc can be changed if desired, but it's not necessary today)  Select chromosome 12, then OK.  You should see a picture of chromosome 12 as shown below, with the effects of each of the traits you selected, with red showing the most significant effects and gray showing no significant effects, and the yellow line next to the chromosome showing where the LA1657 allele is associated with an increased effect (increased yield, increased firmness etc).  Therefore an area of the chromosome showing significant effects but WITHOUT a yellow line next to it signfies DECREASED effects.

 

Here we can see that while CT79 has good effects on brix, in both locations, it also is associated with a decrease in total yield. There was no effect on firmness associated with this region. We would have to decide whether we still wanted to go ahead and create a NIL with the region of CT79, and hope to get rid of the associated decrease in yield by creating additional recombinations, or by combining it with a QTL for increased yield that might overcome this negative effect, or to select a different region for making a new NIL.  For example, it looks like a NIL selected for CT99 might still have good brix, but a less significant effect, but also no other negative effects.

In the most recent version of QGene, 3.07, you can get a quick overview of combined trait effects while in the single point regression window, by selecting any other traits you are interesteed in. QGene overlays the effects of each trait as a different colored line in the graph.

Suppose we decide to create a NIL that has only the CT99 region and not the rest of the wild parent chromosome. We can use the "NIL extraction" function to help us choose the best plant in the population from which to make a NIL.  Close the Multiplot window.  Under the View menu, choose Map.  Click on CT99 on chromosome 12 to find the correct place that will highlight that marker.  Hold down the SHIFT key and click on CT99 again.  A colored scale will pop up with LA1657 at one end (red) and TA209 at the other--point the cursor at the LA1657 end of the scale and let go-CT99 should now be red on the map.  We have just selected this as the region we want to be introgressed from LA1657.

Under the Analyses menu, select NIL extraction.  Select the backcross then self option as we're getting our seed directly from the self fruit from the BC plots in the field.  Click  OK, and a table of NIL candidates should appear, like this:

 

Selected

Chromosome

 

 

CT99

12

 

 

 

Line

P

0.90

0.95

Missing  markers

96T844-5

0.00403

571

742

4

96T822-6

0.00078

2965

3858

6

96T844-11

0.00056

4105

5341

2

96T81-13

0.00018

12607

16402

4

96T821-9

0.00016

14100

18345

0

96T79-10

0.00014

16399

21335

0

96T844-24

0.00003

79186

103024

3

96T847-4

0.00001

175328

228107

2

96T847-10

0.00001

177231

230582

0

96T844-7

0.00001

181078

235587

1

96T822-20

0.00001

223813

291187

3

96T81-12

0.00001

224840

292523

28

96T822-1

0.00001

232984

303119

3

96T844-4

0.00001

238882

310792

4

This table shows the # of progeny needed to grow from each line to have a 95 or 90% chance of being able to select a "clean" NIL (containing the region we want, and no other regions).  So from plant 96T844-5, the best candidate, we should grow 742 plants to have a 95% chance of getting a clean NIL. 

We can also look at the graphical genotype of this plant to see it's exact genetic makeup.  Under the View menu, select Marker Genotypes.  Under Choose line, select Show which line.  Click in the box at the right corner of this window and type 96T844-5, then OK. 

 

(Under the File menu, you can choose Deselect all to get rid of the red color if you want).  The shaded in parts of the chromosomes are the regions homozygous for esculentum alleles, the hatched parts are heterozygous regions and white parts mean missing data.  You can see that this line contains CT99 like we wanted, but also a big region that we don't want;  those we will have to select against using marker assisted selection.

As a plant breeder, you would need to make decisions based not only on this type of information but on the needs and limitations of your particular research program. For example, are your higher costs in growing out large numbers of plants or running large numbers of markers? How much linkage drag are you willing to accept? Like any program, QGene is simply a tool to give you additional information to use in making the best decisions for your research.


Last update July 21, 2006

 

 
               
 
Have a Question or a Problem?
Email the helpdesk
 
                     
       

We will try to answer it or find someone who can!
*Looking for a protocol?
*Troubleshooting a technical problem?
*Looking for a collaborator?

More about the helpdesk, including phone, fax, and snailmail info......

Please check our Queries for Help