December 7, 2009: Two related talks this Wednesday by a visitor to the CS department, Dr. Dan Brown from Waterlooo.

Speaker: Dan Brown (Cheriton School of Computer Science, University of Waterloo)
When: Wednesday, December 9, 12 noon
Where: 1325 SEO
Title: From DNA to hip hop: how ideas from bioinformatics can automate finding rhymes in rap music


Unlike most kinds of music, the core of rap music is found in the rhythm and rhyme of its lyrics. Different artists or subgenres will use different kinds of rhyme, which in some cases can be extremely complicated: the end of one line may rhyme with several parts of the previous line, and in some cases, rhymes may be imperfect.

Detecting these complex rhyme patterns manually is time consuming and tedious. We have designed a system for automatic rhyme annotation. Our approach is founded on several bioinformatics ideas. First, using a test corpus of known rhymes, we develop a probabilistic model of rhymed and unrhymed syllables. Then, we use that model to build a log-likelihood ratio scoring matrix for identifying what is and is not a rhyme. Finally, we create a local alignment procedure to find high-scoring lyrics segments.

Our procedure has high sensitivity and specificity in identifying true rhymes in an annotated corpus; essentially, it identifies most complex rhymes, and identifies few false rhymes. We can use it to characterize artists, and then to develop classifiers for individual artists with surprising success.

Joint work with MMath student Hussein Hirjee

Speaker: Dan Brown (Cheriton School of Computer Science, University ofWaterloo)
When: Wednesday, December 9, 3pm
Where: 1127 SEO
Title: Two new ways to decode HMMs: many paths, or robust decoding


Hidden Markov models are the standard probabilistic tool used to divide biological (or other) sequences into features found in the sequence. Typically, people use classical algorithms, such as Viterbi or posterior decoding to identify these features, but both have serious problems. We present two new decoding algorithms for this problem, where we either summarize and explore the k most likely paths through an HMM or attempt a robust decoding in which we count a prediction as correct if it characterizes the set of features correctly and sets the features boundaries approximately right. Our methods both work much better than Viterbi at identifying the topology of membrane proteins, though they are not nearly as efficient.

Joint work with MMath student Daniil Golod and PhD student Jakub Truszkowski, both papers to be presented at Asia-Pacific Bioinformatics Conference in January.

Dan Brown is Associate Professor of Computer Science at the University of Waterloo, where he has been since 2001. From 2000 to 2001, he worked on the human and mouse genome projects at the Whitehead/MIT Center for Genome Research. His interests are in algorithms for understanding the information in discrete sequences, particularly identifying patterns in DNA and protein sequences.

Copyright 2016 The Board of Trustees
of the University of
Helping Women Faculty Advance
Funded by NSF