Deciphering the Regulatory Code: Discriminative Learning with Hidden Markov Models

Lecturer : 
Jonas Maaskola
Event type: 
HIIT seminar
Doctoral dissertation
Respondent: 
Opponent: 
Custos: 
Event time: 
2012-11-02 10:15 to 11:15
Place: 
Kumpula Exactum B119
Description: 

 

 

I present an application of discriminative learning with hidden Markov models in the field of computational biology. The application is to discover in DNA or RNA sequences binding site motifs for regulatory proteins. The method uses contrasts of positive and negative example sequences and evaluates the association of prospective binding site patterns with the contrast. Mutual information of condition and motif occurrence is used as measure of association. The method is composed of an initial phase in which discriminative words are found by heuristic discrete optimization. These are then used as seeds for hidden Markov models. The HMM parameters are subsequently optimized in continuous space by gradient learning, maximizing mutual information of condition and motif occurrence.

Conceiving of the utilized contrasts as channels carrying regulatory signal amounts to likening motif finding to deciphering the regulatory code. The HMM learning methods for discriminative objectives I present are completely general, and may find applications in other fields. The motif finding method scales to large data set sizes, makes use of available repeat experiments, and aside from binary contrasts also more complex data configurations can be utilized.

Previous work in the field of motif finding is either only word based, i.e. uses only discrete opti- mization, or uses other measures of association. In the field of voice recognition, mutual information has been used to parameterize HMMs in a related but different manner in the Maximum Mutual Information Estimation (MMIE) framework, and I intend to explain how the methods relate.

 


Last updated on 29 Oct 2012 by Dorota Glowacka - Page created on 29 Oct 2012 by Dorota Glowacka