- Generalised suffix tree
A generalised suffix tree is a
suffix tree for a set of strings. Given the set of strings of total length , it is aPatricia trie containing all suffixes of the strings. It is mostly used inbioinformatics ref|BRCR.Functionality
It can be built in time and space, and you can use it to find all occurrences of a string of length in time, which is
asymptotically optimal (assuming the size of the alphabet is constant, see ref|Gus97 page 119).When constructing such a tree, each string should be padded with a unique out-of-alphabet marker symbol (or string) to ensure no suffix is a substring of another, guaranteeing each suffix is represented by a unique leaf node.
Algorithms for constructing a GST include
Ukkonen's algorithm andMcCreight's algorithm .Example
A suffix tree for the strings
ABAB
andBABA
is shown in a figure above. They are padded with the unique terminator strings$0
and$1
. The numbers in the leaf nodes are string number and starting position. Notice how a left to right traversal of the leaf nodes corresponds to the sorted order of the suffixes. The terminators might be strings or unique single symbols. Edges on$
from the root are left out in this example.Alternatives
An alternative to building a generalised suffix tree is to concatenate the strings, and build a regular suffix tree or suffix array for the resulting string. When you evaluate the hits after a search, you map the global positions into documents and local positions with some algorithm and/or data structure, such as a binary search in the starting/ending positions of the documents.
References
* cite conference
author=Paul Bieganski, John Riedl, John Carlis, and Ernest F. Retzel
title=Generalized Suffix Trees for Biological Sequence Data
booktitle=Biotechnology Computing, Proceedings of the Twenty-Seventh Hawaii International Conference on.
year=1994
pages=35-44
url=http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=323593
* cite book
last = Gusfield
first = Dan
origyear = 1997
year = 1999
title = Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
publisher = Cambridge University Press
location = USA
id = ISBN 0-521-58519-8External links
* [http://www.cosmion.net/jeroen/software/gst/ Online GST demo] : a web demo for generating a generalised suffix tree.
Wikimedia Foundation. 2010.