All the versions of this article:
2008 dec.
“Parsimonious Markov models and applications to biological sequence analysis”
Pierre-Yves Bourguignon (mail) supported his thesis on December the 15th in Évry. His research was carried out in the “Computational Systems Biology” team led by Vincent Schächter (mail).
Markov chains are massively used in biological sequence analysis, although their practical use raises statistical issues regarding the choice of their memory length. While increased memory length allows to capture more information from the sequence, this benefit can be more than compensated by the associated degradation of the quality of estimation. Adaptive solutions, namely Variable length Markov chains, have been proposed in the early 80s, and further developed afterwards in the fields of text compression and statistical modelling of discrete-valued sequences. This thesis proposes a generalization of this approach, resulting in the introduction of Parsimonious Markov models. Besides the definition of this class of models, a bayesian model selection algorithm and the associated convergence theorem are presented.
Key words : Markov chains, Parsimony, model selection, bayesian statistics