All the versions of this article:
State of the project:
Unity of Biochemistry and Molecular Biology of Insects in the Pasteur Institute
Malaria (fact sheet from the Pasteur Institute, in French)
The Anopheline mosquitoes (fact sheet from the Pasteur Institute, in French)
The PAL+ programme of the French Ministry for Research (in French)
Molecular Biology of the interaction Anopheles / Plasmodium : Fotis Kafatos’ group at the EMBL
Genetics of the Anopheles immunity from Plasmodium : Frank H. Collins’ group at the University Notre Dame
FTP access to “scaffolds” (scaffolds) via NCBI
Access to sequences produced at Genoscope
The article that describes the draft in Science
AnoDB, an Anopheles database kept at the IMBB in Greece
The work on Anopheles began at Genoscope in 1998, in the context of a collaboration with the Biochemistry and Molecular Biology of Insects Unit at the Pasteur Institute. This involved the sequencing of the extremities of large fragments-110 thousand base pairs (kb) on the average-of the genome of the mosquito. These fragments have been cloned in bacterial artificial chromosomes by Frank Collins’ group at the University of Notre Dame (USA). The library constructed by F. Collins, which was duplicated in the USA and in Europe, contained 12,000 BAC clones and represented about 5 times the genome of Anopheles gambiae, which is 280 base pairs (Mb)in length. More than 22,000 BAC end reads were carried out by Genoscope, which represents more than 15 Mb. At the beginning of this project, only 250 kb of the genomic sequence of Anopheles were known. This was therefore the first large-scale survey of the mosquito genome, in which samples were derived from the whole genome.
This project has already provided various types of genomic information. Annotation of randomly chosen sequences from the genome has revealed partial sequences from more than 1000 new genes. Numerous transposable elements have also been discovered, and new families of these elements have been defined. Finally, more than 1000 polymorphic regions - repeated sequences known as microsatellites - have been revealed. These sequences constitute excellent genetic markers for the localization of genes responsible for various phenotypes using a genetic map, or for the study of diversity in natural populations.
Moreover, these BAC end sequences were also an important resource for the sequencing of the complete genome. Designated in this context as Sequence Tag Connectors (STC), they are used to establish large-scale connections: the respective orientations of the two end sequences of each BAC is known, as is the distance which separates them (the size of the insert in the BAC). A first use of STCs is in the selection of BACs with minimal overlap with the perspective of a “walk on the chromosome” beginning with these nucleation points. This strategy has the advantage of not requiring physical mapping. However, although the international consortium for the sequencing of Anopheles adopted a different strategy for sequencing the Anopheles genome, the STC strategy of Genoscope turned out to be useful.
The American company Celera Genomics, which was the main player in the consortium constituted in March 2001, has applied its whole genome shotgun sequencing strategy to the Anopheles genome. The principle is to sequence small fragments chosen at random from the ensemble of the genome, and to then assemble these reads in “contigs” based on their overlaps. In the context of this new collaboration, Genoscope produced 10% of the total of the reads assembled by Celera. Reading of these sequences by pairs, at both ends of inserts of different sizes, permits linking, ordering and orientation of contigs between each other, and on the scaffolds, and also validation of the reliability of the assembly by analysis of errors in orientation and distances of the pairs of sequences. In this context, the STCs from Genoscope and TIGR were used to construct large-scale bridges between the contigs. In this way, it was possible to cluster the 10,000 contigs into 303 large scaffolds of more than 30 kb, representing 91% of the reconstituted sequence, and into more than 8000 small scaffolds. Although it is relatively easy to estimate the size of the numerous gaps between contigs in the scaffolds, it is more difficult to evaluate the gaps between the scaffolds themselves. The majority of scaffolds could be mapped on the three chromosomes of Anopheles using a physical map. This map was created by hybridization of BACs on the giant “polytene” chromosomes from the salivary glands of the mosquitos. The assembly, which was completely executed by Celera, was nevertheless complicated by the degree of polymorphism of the sequenced strain. Some portions of the genome draft may be improperly assembled.
Genoscope and the Pasteur Institute are working to improve the quality of the draft. We have undertaken the complete sequencing of complementary DNA from Anopheles. These cDNAs (copies of messenger RNA from transcription of Anopheles genes) provide a means of validating the annotation, and making corrections. Furthermore, they also make it possible to correct local assembly errors.