La thématique principale de l'équipe est le traitement des données issues des séquençeurs nouvelle génération. Le groupe interagit avec le laboratoire de séquençage, l'équipe développement technologique (développement de nouveaux protocoles et mise en place de nouveaux séquençeurs), et les équipes de recherche. Les missions sont multiples : la mise en forme des données produites par les séquençeurs, le contrôle qualité des données, l'assemblage de génomes et de transcriptomes, l'annotation de génomes eucaryotes.
Our lab is part of the Genoscope and attached to the Institut de Biologie François Jacob of the CEA. The French Alternative Energies and Atomic Energy Commission ( CEA ) is a key player in research, development and innovation in four main areas: defense and security, low carbon energies (nuclear and renewable energies), technological research for industry, fundamental research in the physical sciences and life sciences. Currently the CEA has more than 20,000 employees and is established in nine centers spread throughout France.
Genoscope was founded in 1996 to contribute to the Human genome project and develop genomic programs in France and has subsequently turned toward environmental genomics. It has been part of the CEA Fundamental Research Division (DRF, The Knowledge Factory) since 2006 (Biology field). Genoscope is developing methods and projects for the exploitation of biodiversity, in particular with respect to massive DNA sequencing and bioinformatics. It has been open to the national scientific community through calls for coordinated projects in the context of France Genomique since 2012. The projects cover all biodiversity, particularly the genomics of plants and fungi and the metagenomics of complex ecosystems. Genoscope is affiliated to the Paris-Saclay university which is one of the leading French and European universities, rated 13th in the 2021 Shanghai ranking and recognised for the quality of both its educational programmes and teaching staff. The university also boasts high international visibility thanks to the reputation of its 275 research laboratories and their teams and provides outstanding daily support for the integration and development of 65,000 multicultural students.
Genoscope is composed of several research laboratories with both sequencing and IT equipment. The sequencing laboratory operates short- and long-reads technologies (Illumina, MGI and Oxford Nanopore) and has the capacity to operate large-scale genomic projects with a high number of samples. Since 2012, Genoscope has managed and sequenced the samples from the Tara Ocean expeditions. Genoscope incorporates a 1700-core computing cluster with several large-memory nodes (2-6Tb), and a globally distributed storage of 1.5 PB.In addition, the Genoscope has access to the CEA computing infrastructure. ( CCRT with dedicated large-scale computing infrastructure and storage of 5 PB )
The main activities of the LBGB is to develop and evaluate new bioinformatics technologies and software to be used in original and large-scale genomic projects, and in particular with the goals of generating chromosome-scale assemblies of complex genomes based on a combination of long-reads sequencing with long-range information; of providing a gene annotation platform for eukaryotic genomes; and of performing comparative genomics analyzes aiming to establish links between the specificities of a given genome and its life traits. Here are several topics we are currently working on:
In connection with the sequencing lab, we continuously evaluate the sequencing technologies and their associated protocols. This technological survey allows us to propose sequencing strategies adapted to Genoscope projects. We are also developing bioinformatic tools to check the quality of sequencing data produced at Genoscope.
Standards are evolving rapidly, and chromosome-level assemblies, as well as annotations integrating state-of-the-art methods are needed. The first sequencing of the Tara Oceans project generated a high proportion of unknown sequences, showing the strong need to generate a more complete database of marine organisms. We are developing new methodologies with the final goal of obtaining near-complete genomes of unknown organisms, presently an unreached goal for eukaryotes.
Moreover, with the dropoff of the sequencing costs, we could expect that a large variety of genomes will be resequenced, with the goal of generating several references assemblies for a given species. One bottleneck will be the gene prediction, and for this we are working on the development of a gene predictor, called Gmove, that can be used to perform de novo gene prediction as well as to transfer annotations from one genome to another.
Specific developments in the field of transcriptomics to integrate the new possibilities of nanopore sequencing using RNA molecules. We plan to create tools for building complete transcript maps and their associated expression profiles across experiments, adapted to the environmental datasets produced by the different consortia with whom we are collaborating.
Long-lived sessile organisms must persist in the face of a wide range of abiotic and biotic threats over their lifespans. We investigated the genomic features associated with such a long lifespan by sequencing, assembling and annotating genomes of several species. We then used the growing number of whole-genome sequences to investigate the parallel evolution of genomic characteristics potentially underpinning longevity.
Comparative genomic analysis requires visualization tools, for that purpose, assemblies and genomic features are available through a dedicated interface based on the Generic Genome Browser (GGB, add link). We also develop specific tools that allow us to investigate synteny between genomes and evolutionary history of studied genomes.
Tools development is guided by the scientific applications of Genoscope. Generally, we use existing software developed by other bioinformatic groups, but we have to manage the bioinformatic issues brought by our own scientific projects. Available tools are not necessarily adapted to our needs, we have to evaluate these tools, modify them and eventually develop new tools.
Production de données de séquençage Sequencing data production
Responsable d'équipe Team leader
Assemblage de génome Genome assembly
Assemblage de génome Genome assembly
Analyse transcriptomique Transcriptomic analysis
Génomique comparative Comparative genomics
Stagiaire en assemblage de génome Internship on genome assembly
Assemblage de génome Genome assembly
Prediction de gènes pour le projet BGE Gene prediction for the BGE project
Thèse en metatranscriptomique Thesis on metatranscriptomics
Production de données de séquençage Sequencing data production
Assemblage de génome Genome assembly
Assemblage de génome Genome assembly
Production de données de séquençage Sequencing data production
Assemblage de génome Genome assembly
Prédiction de g&eagrave;nes Gene prediction
Stagiaire en assemblage de génome Internship on genome assembly
Prédiction de g&eagrave;nes Gene prediction
Prédiction de g&eagrave;nes Gene prediction
Each year, we welcome several students to do their internship in our laboratory. If you are interested in doing your internship in one of our scientific fields, do not hesitate to apply by sending us a CV to: stage_lbgb@genoscope.cns.fr
jmaury@genoscope.cns.fr
01 60 87 25 00