All the versions of this article:
The informatics treatments applied to the data produced by Genoscope belong to the so-called “data intensive” category of applications. This family of applications is characterized by the processing of large quantities of data which are read, written and modifed by programs which will filter them, evaluate their quality, make comparisons with other data already known, or analyze the content using statistical methods. Some of this data treatment also belongs to the category of so-called “intensive calculations” applications.
In all cases, these applications take a long time for execution, due to the quantity of data, or the complexity of the algorithm. Fortunately, it is usually not necessary to treat large quantities of data with an expensive algorithm.
The production of sequences by the automatic type 3,730 DNA sequencers generates about 6 Go of raw data (chromatograms) per day. For some projects, the raw data are placed in a publically accessible depot, the “trace repository” .
The steps in the preparation of the DNA before sequencing are registered, for each sample produced, in a system of laboratory data management, the LIMS (Laboratory Information and Management System). At the beginning of 2007, the LIMS database contained information on the processing (or manipulation) of 300,000 DNA plaques, which led to the production of 44 million sequences. This database is in constant evolution in order to take into account the continuous modifications in the production process due to optimizations and new technologies.
| Director : Claude Scarpelli ([Email]) |
|
| System | Laurent Sainte Marthe Sylvain Bonneval Denis Debaussart Fabien Dupont Simon Vallet Claude Verdier |
| Flux and data processing | Véronique Anthouard Arnaud Couloux Carole Dossat Frédérick Gavory Julien Gass Maud Haquelle |
| Development | Ludovic Fleury Franck Anière Simone Duprat Shahinaz Gas E’Krame Jacoby Sumitta Samair |
| Technological Developments | Julien Patrouix |