GSP Algorithm

GSP Algorithm

GSP Algorithm ("Generalized Sequential Pattern" algorithm) is an algorithm used for sequence mining. The algorithms for solving sequence mining problems are mostly based on the "a priori" (level-wise) algorithm. One way to use the level-wise paradigm is to first discover all the frequent items in a level-wise fashion. It simply means counting the occurrences of all singleton elements in the database. Then, the transactions are filtered by removing the non-frequent items. At the end of this step, each transaction is a modified transaction consisting of only the frequent elements it contains. We use this modified database as an input to the GSP algorithm. This process requires one pass over the whole database.

GSP Algorithm makes multiple passes over the database. In the first pass, all single items (1-sequences) are counted. From the frequent items, a set of candidate 2-sequences are formed, and another pass is made to gather their support. The frequent 2-sequences are used to generate the candidate 3-sequences, and this process is repeated until no more frequent sequences are found. There are two main steps in the algorithm.

* Candidate Generation. Given the set of frequent (k-1)-frequent sequences F(k-1), the candidates for the next pass are generated by joining F(k-1) with itself. A pruning phase eliminates any sequence, at least one of whose subsequences is not frequent.
* Support Counting. Normally, a hash tree-based search is employed for efficient support counting. Finally non-maximal frequent sequences are removed.

Algorithm

F1 = the set of frequent 1-sequencek=2,do while F(k-1)!= Null; Generate candidate sets Ck (set of candidate k-sequences); For all input sequences s in the database D do Increment count of all a in Ck if s supports a Fk = {a Є Ck such that its frequency exceeds the threshold} k= k+1; Result = Set of all frequent sequences is the union of all Fks End doEnd do

The above algorithm looks like the Apriori algorithm. One main difference is however the generation of candidate sets. Let us assume that A → B and A → C are two frequent 2-sequences. The items involved in these sequences are (A, B) and (A,C) respectively. The candidate generation in a usual Apriori style would give (A, B, C) as a 3-itemset, but in the present context we get the following 3-sequences as a result of joining the above 2- sequences A → B → C, A → C → B and A → BC The candidate –generation phase takes this into account.The GSP algorithm discovers frequent sequences, allowing for time constraints such as maximum gap and minimum gap, among the sequence elements. Moreover, it supports the notion of a sliding window, i.e., of a time interval within which items are observed as belonging to the same event, even if they originate from different events.


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Sequence mining — is concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence. It is usually presumed that the values are discrete, and thus Time series mining is closely related, but usually… …   Wikipedia

  • Meter Point Administration Number — A Meter Point Administration Number, also known as MPAN, Supply Number or S Number, is a 21 digit reference used in Great Britain to uniquely identify electricity supply points such as individual domestic residences. The gas equivalent is the… …   Wikipedia

  • Belief propagation — is a message passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields. It calculates the marginal distribution for each unobserved node, conditional on any observed nodes. Belief… …   Wikipedia

  • Ant colony optimization algorithms — Ant behavior was the inspiration for the metaheuristic optimization technique. In computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems which can be… …   Wikipedia

  • Proprietary software — is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, but restricted from other uses, such as modification, further distribution, or… …   Wikipedia

  • Rhinoplasty — For the album by Primus, see Rhinoplasty (album). Rhinoplasty Intervention Rhinoplasty: The lower lateral cartilage (greater alar cartilage) exposed for plastic modification via the left nostril …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”