How it works


Eukaryotic genome assemblies are downloaded from the NCBI website. Each night, the compute_AssemblyStats_from_NCBI bash script download the eukaryotes.txt file from the NCBI ftp and select only new assemblies.

An example, for all the Mammal assemblies. At the first invocation, it will download ALL the mammal assemblies from NCBI. The compute_AssemblyStats_from_NCBI generate a list of available genome assemblies.
compute_AssemblyStats_from_NCBI Mammals


Then, the assembly_stats.pl perl script is launched with the list of new asssemblies. For each assembly, the fasta file of scaffolds is downloaded and cut at each N to generate contigs (using the scaf2contigs.pl perl script).
assembly_stats.pl -list /tmp/eukaryotes.Mammals.txt -prev /tmp/Mammals.known -outdir /tmp/Mammals_genomes -force


Finally, metrics of scaffolds and contigs are generated using the assemblyMetrics script.