Module DIGEST_functions
[hide private]
[frames] | no frames]

Module DIGEST_functions

source code

Class and functions use by the DIGEST workflow


Requires:
Classes [hide private]
  MyDialect
csv class use to read csv files
  jobLauncher
Create class able to launch a list of jobs on SLURM or LSF on a Job Scheduler
  sequence
New sequence object
  ORF
New ORF object from MetaGene output
  ContigORF
New contig ORF object from MetaGene output
  Cigar
New alignment CIGAR object
  alignmentSAM
New alignmentSAM object from a line of a SAM file, see SAM format for more informations
  cluster
New cluster object
  clusterSequence
New sequence in a cluster object
Functions [hide private]
integer
exist(fname)
Check the existence of a file.
source code
list
clstrParser(file)
parse .clstr file from cd hit
source code
dictionary
fastaReader(file)
Fasta parser
source code
booleen
geneExtended(orfSTART, orfEND, alignSTART, alignEND, orfSTATUT)
Check if a gene has been extended
source code
booleen
geneSeen(orfSTART, orfEND, alignSTART, alignEND)
Check if a gene has been seen
source code
string
reverseComplement(sequen)
make the reverse complement of a sequence
source code
dictionary
metageneParser(file)
Sotck ORFs of metagene file
source code
list
subjectStartStop(alignment, subjectLength)
From an alignment and a length, compute the start and stop alignment position
source code
integer
fileLineNumber(file)
Compute the int number of lines from a file
source code
integer
nbSequenceFasta(file)
Compute the number of sequences in a FASTA file
source code
 
writeORF(ORFlist, prefix, ID, sequence, n)
write ORFs in PREFIX_complete.fasta file or PREFIX_partial.fasta file
source code
Variables [hide private]
  __doc__ = ...
  __package__ = None
Function Details [hide private]

exist(fname)

source code 

Check the existence of a file.

Parameters:
  • fname (string) - file name
Returns: integer
1 if the file is present, 0 otherwise

clstrParser(file)

source code 

parse .clstr file from cd hit

Parameters:
  • file (string) - file name
Returns: list
list of cluster object

fastaReader(file)

source code 

Fasta parser

Parameters:
  • file (string) - file name
Returns: dictionary
dictionnary of sequence object with sequence ID as key

geneExtended(orfSTART, orfEND, alignSTART, alignEND, orfSTATUT)

source code 

Check if a gene has been extended

Parameters:
  • orfSTART (integer) - ORF start position in sequence extended
  • orfEND (integer) - ORF end position in sequence extended
  • alignSTART (integer) - alignement start position of sequence on contig
  • alignEND (integer) - alignement end position of sequence on contig
  • orfSTATUT (string) - ORF complete or partial
Returns: booleen
True if the gene is completed, Fasle otherwise

geneSeen(orfSTART, orfEND, alignSTART, alignEND)

source code 

Check if a gene has been seen

Parameters:
  • orfSTART (integer) - ORF start position in sequence extended
  • orfEND (integer) - ORF end position in sequence extended
  • alignSTART (integer) - alignement start position of sequence on contig
  • alignEND (integer) - alignement end position of sequence on contig
Returns: booleen
True if the gene is seen, Fasle otherwise

reverseComplement(sequen)

source code 

make the reverse complement of a sequence

Parameters:
  • sequen (string) - nucleotide sequence
Returns: string
reverse complement of sequence

metageneParser(file)

source code 

Sotck ORFs of metagene file

Parameters:
  • file (string) - file name
Returns: dictionary
a a dictionnary with contigs IDs as key and contig object as value

subjectStartStop(alignment, subjectLength)

source code 

From an alignment and a length, compute the start and stop alignment position

Parameters:
  • alignment (alignmentSAM) - alignmentSAM object
  • subjectLength (integer) - subject sequence length
Returns: list
a list with the position start and stop of the alignment (if start = -1 --> alignment start befor the subject sequence ; if stop = -2 --> alignment stop after the subject sequence)

fileLineNumber(file)

source code 

Compute the int number of lines from a file

Parameters:
  • file (string) - file name
Returns: integer
number of lines

nbSequenceFasta(file)

source code 

Compute the number of sequences in a FASTA file

Parameters:
  • file (string) - file name
Returns: integer
number of sequences

writeORF(ORFlist, prefix, ID, sequence, n)

source code 

write ORFs in PREFIX_complete.fasta file or PREFIX_partial.fasta file

Parameters:
  • ORFlist (list) - list of ORF object
  • prefix (string) - prefix of output file name
  • ID (string) - sequence ID
  • sequence (string) - nucleotide sequence
  • n (integer) - limte length for partial ORF

Variables Details [hide private]

__doc__

Value:
"""
Class and functions use by the DIGEST workflow

@requires: jobArrayLSFlauncher_modif.sh
@requires: mpirun-genoscope-modif.sh
"""