Lien pour acceder au site du CEA
Site Genoscope en langue française Genoscope site in english El sitio Genoscope en español
Home page > Sequencing > Projects > Plants > Oryza sativa > The rice genome, a «Rosetta stone» for other cereals

All the versions of this article:

Oryza sativa

The rice genome, a «Rosetta stone» for other cereals

Why sequence a cereal?

Rice, wheat, barley, corn, sorghum, millet, sugar cane - ever since the Neolithic revolution, cereals have constituted a staple diet in human nutrition. For thousands of years, humans have shuffled genes by breeding and selection in order to create the familiar domestic varieties of these grasses. Considerable progress has been accomplished in taste, nutritional value and productivity, notably during the “Green Revolution” which took place between 1960 and 1970. However, the Green Revolution has also known its failures, and we can no longer count on universal distribution of a few “high yield” varieties. The advances in agronomy needed today because of the world population explosion, the loss of arable land and climate change necessitate a new revolution: the sequencing of the genome of a cereal.

With the sequence of the genome in hand, we will be able to:

  • Create a more or less exhaustive inventory of genes of the cereal, and try to assign a function to them.
  • Identify “candidate genes” for agronomic characteristics which have been mapped in a genetic interval.
  • Access a large quantity of new chromosomal markers to assist in selection and help in the creation of new varieties.
  • Create “DNA chips” for the global analysis of gene expression in cereal under different conditions.
  • Take advantage of many other applications.

Rice, a model cereal

JPG - 13.1 kb
rice plant (USDA credit)

Rice (Oryza sativa) is the cereal that has been selected to be sequenced as a priority. It has the smallest genome of all the cereals: 430 million nucleotides. The corn genome is five times larger, and that of wheat, 40 times larger! However, preliminary comparisons between different cereal genomes revealed large blocks of homologous genes whose order is relatively conserved. This phenomenon, which is known as synteny, makes rice a good entry point for characterizing the genes of other cereals, and associating them with various agronomic traits. Furthermore, rice can serve as a model genome for one of the two main groups of flowering plants, the monocotyledons, in the same way as Arabidopsis thaliana is the model for the other group, the dicotyledons. The statute of “model plant” for rice is also supported by the existence of numerous resources for a genomic approach, such as excellent genetic maps and efficient techniques for genetic transformation which makes rice the easiest cereal to transform.

Finally, rice is already a model in several domains, because it has been the subject of studies on yield, hybrid vigor, genetic resistance to disease and adaptive responses. Scientists have taken advantage of the existence of a multitude of varieties adapted to a very wide range of environmental conditions, from dry soil in temperate regions to flooded cultures in tropical regions.

The genome of rice is also interesting in itself, because this cereal constitutes the daily basis of alimentation for more than half of humanity. Rice production represents 30% of the world cereal production today. It has doubled in the last 30 years, in part due to the introduction of new varieties, but its present growth barely follows consumption: in 2025 there will be 4.6 billion people that depend on rice for their daily nourishment, compared with three billion today. A new leap in production is therefore expected. At the same time, small producers will have to use land which is less favorable for cultivation, such as brackish or briny soils, and the availability of water resources will become more and more problematic.

Knowledge of the rice genome will be a boon for plant breeders who are trying to increase yield and create new varieties which are resistant to disease, pests, drought or salinity. It will allow them to identify genes of agronomic interest and to search for advantageous allelic variants in the collection of 90,000 traditional varieties and wild species of rice managed by the International Rice Research Institute. This resurgence of interest in the genetic diversity of rice should raise public awareness about the genetic erosion that is being caused by abandoning traditional varieties in favor of those with high yield, which is one of the unfortunate consequences of the green revolution. The development of rice genomics may constitute a turning point as it will make easier the transfer of advantageous traits to locally adapted varieties.

The origin of the Rice Genome Project

The beginning of the exploration of the rice genome dates from the 1980s, with the implication of Chinese, Japanese and American groups. The project gained impetus at the beginning of the 1990s, when people began to realize that rice was the “Rosetta stone” of the cereals, to use the words of Chris Sommerville of Stanford University. Increased interest in rice in both the academic and private sectors spurred Japan, for which rice constitutes a national cause. This country then reorganized a rice genome mapping project to create the Rice Genome Research Program (RGP). The well-financed RGP dominated the rice genome scene and became the advocate for an international sequencing effort. The idea was informally approved in 1997 at a meeting in Singapore sponsored by the Rockefeller Foundation. The participating countries (Japan, China, the European Union, the United States, South Korea) agreed on several aspects of the project:

  • The choice of a common material for sequencing: seeds of a single plant of the Nipponbare or GA3 variety. This choice was dictated by the resources which had already been constituted by the RGP from this cultivar: maps of STS (Sequence Tagged Sites), large cDNA and EST (Expressed Sequence Tags) collections and the existence of a YAC (Yeast Artificial Chromosome) physical map covering more than 50% of the genome. The long grain Nipponbare variety belongs to the japonica sub-species (traditional upland rice, or “plateau rice”), which is cultivated preferentially in Japan and in regions with a temperate climate such as Europe.
  • The choice of a “clone by clone” sequencing strategy, which was that used for the sequencing of the human and Arabadopsis genomes. This strategy requires a mapping effort to order large genomic clones which are then sequenced individually (see Sequencing Strategy); its advantage is that it guarantees complete sequencing. The goal, scheduled for 2008 at the time of the meeting, was indeed a “finished” sequence, with a minimum number of gaps and a 99.99% level of precision (less than two errors in 10,000 nucleotides), as for the human and Arabadopsis genomes. It was reasoned that, if rice is to serve as a model for genomes of other cereals, it is very important to have a sequence that has been determined as precisely as possible, especially as no other cereal may be integrally sequenced in the future.
  • Finally, an agreement for an international collaboration which involves allocation of the 12 rice chromosomes to the partners and exchanges of clones and mapping and sequencing information without delay. It was also agreed that the sequence data must be deposited in public databases once a given level of quality is reached, so that they are freely available to scientists all over the world, from both private and academic sectors.

This collaboration was confirmed in February 1998 by the creation of an international consortium, IRGSP, initially including Japan, China, the United States, South Korea and the United Kingdom. The latter country withdrew from the consortium, as did Thailand and Canada, which had joined later. On the other hand, France, Taiwan, India and Brazil joined the consortium permanently. As early as 1998, France had manifested its intention to join the consortium by taking charge of rice chromosome 12, which is 30 million nucleotides long. This choice was partly due to previous mapping studies on this chromosome at IRD in Montpellier (mapping of loci for resistance to various pathogens including the rice yellow mottle virus). The correspondence between the rice chromosome 12 and a region of wheat chromosome 5, on which a genetic locus implicated in the hardness of wheat grains, should also be noted. This characteristic determines the bakery quality of the flour, which is of interest to French seed companies. French groups from IRD and CIRAD participated in the initial phase. In collaboration with these groups, Genoscope defined several “seed points” along chromosome 12, from which sequencing began in 2000 at Genoscope.

GIF - 54.8 kb
Alignment of the chromosomes of six cereals (for a “haploid” chromosome number) with the 12 chromosomes of rice after defining corresponding chromosomal regions from the genetic maps. Each sector of the wheel corresponds to a linkage group in the rice genome. The very high level of synteny between these cereals, despite the great difference in genome sizes, can be seen. The rice genome, with its “small” size, thus appears as an ideal model for looking for homologous genes in syntenic regions of the “large” wheat genome. (Adapted from G. Moore et al., Curr. Biol. 5:737-739, 1995)(USDA credit).

The Monsanto Draft

The situation became somewhat complicated in April 2000 when the agro-biotechnology company Monsanto announced that it had its own “draft” of the rice genome. The company had delegated the work to Leroy Hood’s laboratory at the University of Washington in Seattle. This group had followed the same “clone by clone” strategy as IRGSP: physical mapping led to the selection of about 3,500 large genomic clones covering the genome of the Nipponbare cultivar. The majority of these clones have been sequenced at 5X coverage, and were also end-sequenced.

The result of this work is a rough draft of the rice genome, which may be sufficient for Monsanto to characterize the genes which are of economic interest in other cereals, using rice as a basis. The industrial stakes are certainly less important for the rice genome than for the corn and wheat genomes ; that can perhaps explain Monsanto’s offer to IRGSP: in the course of their announcement, Monsanto offered consortium scientists access to their raw physical mapping and sequence data, under the condition of non-disclosure before merging them with “public” data and not requesting patents on data which had not yet been merged unless Monsanto was given the opportunity to obtain a license. Monsanto also authorizes all interested scientists to carry out a limited amount of research on a dedicated site, after registration of the user.

After Monsanto’s offer, some of the IRGSP scientists were worried about withdrawal of the institutions which were financing the project. Consortium scientists had to explain that Monsanto’s incomplete draft fell short of the goals fixed by the consortium. On the other hand, integration of this data would accelerate progress toward the finished quality sequence and reduce costs. In the form of the IRGSP draft sequence as of beginning of 2003, about 30% of the BAC clones come from the Monsanto map, and the company’s sequence data underlie about 25% of the sequence deposited by the consortium in public databases.

The Syngenta draft

In January 2001 it was the Syngenta Company’s turn to announce the completion of a draft of the genome of the japonica Nipponbare variety, from a collaboration with the Myriad Genetics Company. Unlike Monsanto, Syngenta chose the “full genome shotgun” strategy: the whole rice genome is cut into small fragments which are sequenced and from which one tries to re-assemble the complete sequence. This permits Syngenta to progress rapidly by economizing on the mapping work, and to get a general picture of the rice genome without having to wait until 2008, which was the date initially fixed by IRGSP. With a 6X coverage, the draft assembly at the beginning of 2000 was an ensemble consisting of more than 40,000 fragments for which only part were “anchored” on the chromosomes, and which covered only 93% of the genome. In this case also, the draft, although more complete than that of Monsanto, falls short of the “finished” version which is the goal of the consortium. Like the Monsanto draft, it has not been deposited in public databases, and its characteristics were only described in April 2002, in an article published in Science magazine. On the other hand, a privileged access has been granted to IRGSP and other academic labs.

The “full genome shotgun” strategy has also been adopted by a Chinese academic group, the Beijing Genomics Institute (BGI), which wanted to sequence a cultivar which belongs to the indica sub-species of rice, as well as a second cultivar which was mainly derived from indica. The indica rice is the traditional irrigated rice and is the most cultivated in the world, especially in China. Moreover, the two cultivars which have been chosen by the BGI are the parents of the highly productive “super-hybrid” rice which is widely cultivated in China. The Chinese scientists carried out a 4X assembly on the first cultivar; the result, which was announced in October 2001 and described in April 2002 in Science, is an ensemble of more than 100,000 fragments, which are not anchored on the chromosomes. Although the BGI sequences are accessible in public databases, this data from indica is not as useful to IRGSP scientists as the Syngenta and Monsanto data, obtained from the japonica Nipponbare variety.

The response of the International Consortium

In February 2001, a few weeks after the Syngenta communiqué, IRGSP announced a change in strategy in order to react to this new private sector initiative and take Monsanto’s contribution into consideration:

JPG - 13 kb
(USDA credit)

From now on, the consortium aimed to obtain a high quality preliminary draft of the rice genome. The objective was to sequence the genome at a depth of 10X to obtain so-called “phase 2 sequences”, in which each large clone is covered by a small number of ordered and oriented sequence blocks. In December 2001, progress in the advancement of the project and perspectives for financing made it possible to plan the completion of this high quality draft for the end of 2002. The motivation was to enable plant biologists and breeders to exploit genomic information rapidly, thanks to the “hierarchic shotgun” strategy followed by IRGSP that produces sequences which are perfectly anchored along the genome. The identification of genes may be initiated with relative confidence on the phase 2 draft, but definitive work will have to await the finished sequence.

The achievement of the high quality draft was announced on December 18, 2002, during an official ceremony in Tokyo. The work of IRGSP was praised in messages from the American and French presidents, and the Japanese Prime Minister. With a coverage of 10X, the level of precision was already 99.99%, but around 3500 gaps between the large blocks which constitute the draft remained to be filled. The project entered then in its finishing phase, which should be terminated before December 2004. The finishing work has benefited from the use of the Syngenta draft sequences, which were made available in May 2002.

Why “finish” the rice genome?

It is already possible to evaluate the value of a finished sequence. On November 21, 2002, even before the Tokyo announcement, Nature magazine published the finished sequences of japonica chromosomes 1 and 4, achieved by the Japanese and Chinese groups of the consortium respectively. Direct comparison with the Syngenta draft is not possible because this assembly is not freely accessible, but the results of the annotation can be compared. The Japanese RGP group demarcated 6756 genes on chromosome 1, whereas Syngenta only reported 4467, of which half of those predicted did not have a complete coding sequence. Furthermore, a direct comparison of a portion of the finished sequence of chromosome 1 of japonica with sequences of the 4X draft of indica revealed that a third of the genes annotated by the RGP were only partially predicted in the indica annotation, and that 10% were absent.

From these results, it is obvious that the prediction of genes depends strictly on the quality of the sequence: it is impossible to assert that all the genes can be identified from a draft. Exhaustiveness matters, because only an exhaustive vision of the gene content of rice will reveal whether one or another metabolic pathway is effectively absent, which may be important for the perspective of “metabolic engineering” in this cereal (see the example of “golden rice”). Moreover, the quality of the sequence also influences the analysis of duplications and repeated sequences. Duplications have to be taken into account if the goal is to create new agronomic traits by modification of the expression of one or more genes, because these genes may have copies elsewhere in the genome. As for repeat sequences, these constitute a dynamic element of the genome which is implicated in the generation of a large portion of the allelic variation exploited by breeders. The finishing of the ensemble of the rice genome thus appears indispensable if we wish to derive maximum profits from the sequence for the study and amelioration of this plant and other cereals.

Exploiting the sequence

In addition to the benefits of a 99.99% precision, this sequence has the advantage of being “anchored” on physical and genetic maps: for an agronomic trait which has been mapped in a given genetic interval, it will be possible to get to the corresponding genomic sequences and to the ensemble of the genes which have been annotated in the interval. The information which is available on the function or these genes will make it possible to select one or more “candidate genes” for the trait being studied. Validation of a candidate gene may be performed by studies of its expression and its allelic variability for the trait being studied in several different varieties, as well as the study of mutated or genetically modified plants, in which the expression of the gene has been turned off or increased.

In 2002, using the IRGSP genomic draft, a Japanese team was able to identify a gene implicated in precocious flowering of rice, which could be very important for adaptation of varieties to different latitudes. According to the authors of this study, the Rice Genome Project enabled them to achieve a advance of one to three year in their gene hunt.

The search for candidate genes may also be performed by analyzing the expression profiles of rice genes sampled and arrayed on DNA chips. These expression profiles could be compared for different varieties, for different growth conditions and for plants grown in the presence or absence of specific pathogens.

JPG - 21.3 kb
Rice plant (USDA credit)

Investigators from Syngenta have already utilized a chip sampling 21,000 predicted rice genes to identify several hundred genes expressed during rice grain filling.

Moreover, the sequencing of the rice genome will reveal a large number of new molecular markers of the microsatellite type which will make it possible to refine the genetic analysis by comparison with the older RFLP markers. Thanks to the quality of the finished sequence, it will also be possible to identify SNP-type polymorphisms without the risk of confusion with sequencing errors. In addition to facilitating the search for candidate genes, these new markers will help speed the creation of new varieties by marker-assisted selection: by using markers which are closely linked to a locus responsible for a given characteristic, it will be easy to follow its transmission in experimental crosses, and reduce the amount of undesirable parental genetic material transmitted at the same time as the desirable locus. Even better, when the characteristic is associated with one or more known alleles, the sequence of this or these allele(s) will provide an absolute marker.

These new opportunities for exploring the enormous collection of allelic variants of rice and for introducing them into elite varieties by crossing should provide alternatives to transgenesis: when homologues of the “foreign” gene(s) which scientists wish to introduce are present but inactive in the cultivar of interest, breeders can try instead to obtain the desired phenotype by selection, by crossing the cultivar with local varieties or with species of wild rice carrying “active” alleles, or even with plants which are mutant for these genes (see the concept of TILLING for screening large collections of mutants). Genotype building, which consists in creating optimal allele combinations, will also be facilitated.

Annotation of the rice genome

To make these applications possible, a catalog of the genes of rice is necessary. A first level of annotation has been undertaken by two groups, TIGR in the US and the RGP in Tsukuba in Japan, using the January 8, 2003 freeze of the high quality IRGSP draft. Their results should become available after summer, 2003. This will be an ab initio annotation, using gene-finding programs calibrated during the annotation of the Arabidopsis genome, and which has been adapted to the characteristics of genes of grasses. These programs only produce predictions, which must be validated experimentally. Various functional genomics projects should contribute to this endeavor. For example, IRRI has created an international consortium for functional genomics of rice. Mutagenesis by random insertion of transposons in rice genes is a powerful tool. Collections of tens of thousands of such mutant lines have been created at RGP in Japan and in Korea. In France, a collection of rice insertional mutant lines has been created within the framework of the Genoplante program. An alternative strategy to random mutagenesis has been opened up by Japanese scientists who have succeeded in targeting inactivation of a rice gene by homologous recombination. Finally, the TILLING method which has been mentioned previously allows to identify plants mutated in a given gene by screening lines produced by random mutagenesis.

The availability of the complete sequence of another plant, Arabidopsis thaliana, provides a method of completing the annotations produced by TIGR and RGP. To achieve this, Genoscope has a comparative genomics tool, Exofish, which searches for genomic regions which have been conserved between two species during evolution. With species which are sufficiently far apart in evolution, these regions, known as Ecores, fall into exons with high specificity. Exofish has already been used with the genomic sequence of Arabidopsis to explore the 3200 BAC clones of the high-quality draft of the rice genome (results will be soon presented on the Comparative Genoscope page). The resulting Ecores will help the ongoing annotation process by pointing to potential genes that were not annotated previously, and by suggesting extension of annotated genes.

With the annotation in progress and the finishing of the rice genome sequence before the end of 2004, a rich harvest of results can be expected by the scientific community, including the French research institutes (CNRS, INRA, CIRAD, IRD) which have supported the French participation. The private sector will also be expecting numerous benefits from the sequencing and annotation of this genome.

At this time, the international rice genome sequencing project (IRGSP) includes eight countries: Brazil, China, France, India, Japan, South Korea, Taiwan and the United States. Thailand and the United Kingdom participated in the project initially.
The institutes and other institutions in the various countries are principally the Federal University of Pelotas (UFPel) for the Brazilian Rice Genome Initiative (BRIGI) in Brazil, the National Center for Gene Research (NCGR) in China, Genoscope in France, the Indian Initiative for Rice Genome Sequencing (IIRGS) in India, the Rice Genome Research Program (RGP) in Japan (part of the genomic program of the Japanese Ministry of Agriculture, Fishing and Forestry [Maff]), the Korea Rice Genome Research Program (KRGRP) in South Korea, the Academia Sinica Plant Genome Center (ASPGC) in Taiwan, the National Center for Genetic Engineering and Biotechnology (BIOTEC) in Thailand, the John Innes Center (JIC) in the United Kingdom and The Institute for Genomic Research (TIGR), the Plant Genome Initiative at Rutgers (PGIR), the Arizona Genomics Institute (AGI) and the Clemson University Genomics Institute (CUGI), the Cold Spring Harbor Laboratory (CSHL), the Washington University Genome Sequencing Center (WUGSC) and the Wisconsin Rice Genome Project (GCOW) in the US.

Message from President Jacques Chirac of France on the occasion of the announcement, December 18, 2002, of the completion of a high quality draft sequence of the rice genome:
«An important milestone has just been achieved with the sequencing of the rice genome. It is a remarkable success for humanity and for science, because rice is the most commonly cultivated cereal in the world. I address all my congratulations and gratitude to the “International Rice Genome Sequencing Project” Consortium, which combined their competencies and co-ordinated the distribution of tasks. This international co-operative project has provided completely free access to results which are of general interest for all the countries of the planet. France is proud to have participated in this great achievement.
This work will lead to new developments, notably in the fields of agriculture and nutrition. The work performed by the Consortium constitutes an example of international co-operation that we should follow in other domains.
The sharing of our knowledge and our skills is becoming more essential every day for progress for everyone, and especially for developing countries.»

Last update on 15 January 2008

© Genoscope - Centre National de Séquençage
2 rue Gaston Crémieux CP5706 91057 Evry cedex
Tél:  (+33) 0 1 60 87 25 00
Fax: (+33) 0 1 60 87 25 14

Home | Overview | Projects | News | Press Panorama | Resources | Contact
Follow-up of the site's activity RSS 2.0 | Site Map | Credits | Copyright