BIRDS aims to automate the execution of bioinformatics treatments.Given its ability to automatically manage a large number of bioinformatics in parallel and to maintain detailed records of treatments, it is particularly suited to a production environment. BIRDS is a very flexible API using the rules engine developed by JBoss Drools making BIRDS project an expert based business rules system. BIRDS project is scalable by adding new rules created by users. Business logic separated from the code and expressed in the form of business rules understandable by all users, facilitates long-term maintenance. BIRDS API contains essentially the logic of creating and configuring treatment and execution jobs.

fig 1 : BIRDS environment

In BIRDS, is the availability of new resources that launch the execution of bioinformatics programs. Birds design is centered on treatment configuration (xml files). Once configured (ie connected to resource referential) and declared by the administrator, BIRDS process generate new jobs if resources required by each type of input treatment are available. These jobs are submitted to job scheduler or local execution depending on execution context defined in configuration files. New resources generate by jobs execution will be stored in referential resources attached to the outputs according with configuration. These resources are stored in database as consumed resources in order not to build treatments already processed, and also to certify the way the results have been obtained.

BIRDS reduce the complexity of workflow design based on dataflow interdependencies configuration and the data availability to launch automation of building and executing job.

Business rules are written by the user using the Drools technology. These rules are interpreted by BIRDS at runtime while being written outside BIRDS allowing each user to define their own rules in a specific context. It allows defining business rules in a separate way and delegate all automation functionalities to BIRDS. These rules act at several stages : resources selection strategy, command line building, pre and post job execution,when error occurs by alerting or relaunching job...

Treatment: Treatment defines what user wants to run in BIRDS. It define once in configuration file settings for the program to automate. It is a description of inputs, outputs and parameters of an executable program. Parameters can also be defined at runtime depending on specific context (eg input resources) in rules.
Resource: A resource is the data needed to run the program. Once a resource is available and defined treatments, BIRDS automatically create jobs to run.

BIRDS resource is characterized by a type and a set of key-value pairs.The minimum declaration of a resource type is to declare type name. The definition of a new resources type should be considered because it affects the creation and execution of new jobs.

BIRDS resource must be connected to a resource referential. Resource referential is data support (database, remote server, flat file, ...) that host resources. For each resource from a referential resource, BIRDS resource is created containing a set of information as key/value pairs according to the type required for jobs execution using this resource. Resource referential host resources in several type of resources and can be internal or external depending on whether resources are stored in internal BIRDS database or external referential. Referential must be known and therefore be declared once in configuration file.
Job: A BIRDS job is created with all the information necessary for the execution of a treatment usually represented as a command line. Jobs are built by combining different resources from each input (Cartesian product)

For example, consider the treatment specification of a Blast program that compares a sequence against a public bank sequence (cf fig 2). This treatment specification defines two inputs and an output which produces the results of the alignment between the sequences . The first input admits resource declared as type 'SEQ' for sequences and the second input declared as type 'Bank' for public sequences. Suppose you connect the first input on a resource referential providing two sequences of type ' SEQ ' at a given moment and the second input on a resource referential connected to the bank providing three public sequence of type 'Bank'. We therefore in this example 2x3 = 6 possible sets of resources, a resource set being a BIRDS resource obtained by crossing each retrieved entry. In this example, the treatment specification generate six jobs to run.

fig 2 : Resources combinaison

BIRDS job are builded from a set of parameters defined by users in configuration xml files. BIRDS API offers services to add configuration in BIRDS database to take into account in BIRDS process. Two types of configuration are required :

Admistrative : define all informations about projects and resources
Treatment specification : describe input(s)/output(s) and parameter(s) for an executable in order to generate command line. Each input and output are connected to one or multiple resources referential and support only one resource type which are defined in administrative configuration.

Administrative configuration
Treatment configuration

BIRDS client is a java application, a process started by the administrator.

These process will automate the generation and execution of jobs based on treatment configuration.

fig 3 : BIRDS client

BIRDS client consists of two parts (cf fig 3) :

Process of resource management

Composed of a workflow that runs every 15 min. This process queries all treatment specification declared by the administrator and stored in BIRDS database. For each treatment, process retrieves available resources (as resources combinaison) and provides to job management process. This process stops when all specifications are processed and the cycle restarts after 15 min.

Process of job management

Composed of two independent and continuous process as Java Thread :

Process for job creation: For each resources made available by the process of resource management, process creates job to execute and provides to process of job execution.
Process for job execution: For each job made available by the process of job creation, process execute job.

The business logic is defined by the user outside BIRDS API thanks Drools rules engine rules. BIRDS processes interprete rules at runtime at several stages allowing user to define a specific context according with the current resources or job (cf fig 4). For example, user can define rules at runtime to filter resources at stage "resource selection" or to calculate a parameter at stage "building command line".

fig 4 : Control process by rules

Exemple of selection resources rules

 
rule "selection lotSeq from database device LIMS"
@BirdsRule( selectionRule )
dialect 'java'
salience 300 

	when
	   $input : InputSpecificationElement( name == "bank_blast_input", treatmentSpecification.name == "Blast")
	   $resourcesReferential : ResourcesReferential(name=="CABRI")
	   $device : DatabaseDevice() from $resourcesReferential.referentialDevice 
	   $rps : ResourcePropertiesSet(initialized==false, inputSpecificationElement==$input, resourcesReferential== $resourcesReferential)
		
	then
	   $rps.initialize();
	   Set<ResourceProperties> resourcesPropertiesSet = $device.getPropertiesExecuteQuery("SELECT * from cabri_table");
	   $rps.addResourcePropertiesSet(resourcesPropertiesSet);
	   modify($rps){};
end

BIRDS Processes handle errors encountered by sending signals error to the main workflow (process of resource management). These signals allow breakpoints in the workflow, but can also be intercepted by users rules. Users can also define an action when errors occurs, for example, alerting by email or relaunch job.

fig 5 : Control error by rules

Job Generation

High speed by continuous scanning available resources
Flexibility by adding new rules
Not necessary tree dependencies to define workflow
Resources traceability and job history

Job Execution

Large scale analysis through multi-thread implementation. Job execution is decoupled from the job generation.
Process control by intercepting business rules at different stage.
Job execution in a job scheduler

Advanced

Other features are present in BIRDS API and will be described in the future :

Group of treatment specification

A group of treatment specification is a specification which involves a multi-treatment specifications. As a treatment specification, group treatment specification admits inputs and outputs resources.

It allows to have the general progression of a resource through each treatment. Group treatment is finished once its internal treatment completed

It permits to remove internal job from history once the group execution is finished

It allows error recovery at a specific step.

Map reduce

BIRDS allows to define map/reduce strategy on input/output resources of treatment.

BIRDS Concept

fig 1 : BIRDS environment

fig 2 : Resources combinaison

Configuration

BIRDS Client

fig 3 : BIRDS client

Business Rules

fig 4 : Control process by rules

fig 5 : Control error by rules

Main points

Job Generation

Job Execution

Advanced