How it works
Eukaryotic genome assemblies are downloaded from the NCBI website. Each night, the
compute_AssemblyStats_from_NCBI bash script download the eukaryotes.txt file from the NCBI ftp and select only new assemblies.
An example, for all the Mammal assemblies. At the first invocation, it will download ALL the mammal assemblies from NCBI. The
compute_AssemblyStats_from_NCBI generate a list of available genome assemblies.
compute_AssemblyStats_from_NCBI Mammals
Then, the
assembly_stats.pl perl script is launched with the list of new asssemblies. For each assembly, the fasta file of scaffolds is downloaded and cut at each N to generate contigs (using the
scaf2contigs.pl perl script).
assembly_stats.pl -list /tmp/eukaryotes.Mammals.txt -prev /tmp/Mammals.known -outdir /tmp/Mammals_genomes -force
Finally, metrics of scaffolds and contigs are generated using the
assemblyMetrics script.