19 Aug 10:15 Marina Barsky: Scalable suffix trees in external memory and their applications

Marina Barsky (University of Victoria, Canada) will give on Wednesday August 19, 2009, at 10:15 in room C222 the following lecture:

Scalable suffix trees in external memory and their applications

The suffix tree is a compact index of all substrings of a given text. While being asymptotically linear in the size of the input, in practice, the suffix trees can easily be 12-50 times larger than the input. As such, suffix trees often exceed the typical main memory, even when the input does not. As most of the existing algorithms are designed for RAM, their performance severely degrades when the tree and/or the input do not fit in the available main memory. This has prevented so far the wide application of the suffix trees for massive string data analysis. We show how to construct the suffix tree on disk avoiding random disk I/Os.
  We also present performance data for the tree construction and queries on sequenced genomes. Next we point out open problems for the complete scalability of the suffix trees in secondary storage. Finally, we discuss how this disk-based index can be used for discovering new substring patterns in genomic data.
 


Last updated on 14 Aug 2009 by Visa Noronen - Page created on 19 Aug 2009 by Visa Noronen