Persistent data structure

Persistent data structure

In computing, a persistent data structure is a data structure which always preserves the previous version of itself when it is modified; such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. (A persistent data structure is not a data structure committed to persistent storage, such as a disk; this is a different and unrelated sense of the word "persistent.")

A data structure is partially persistent if all versions can be accessed but only the newest version can be modified. The data structure is fully persistent if every version can be both accessed and modified. If there is also a meld or merge operation that can create a new version from two previous versions, the data structure is called confluently persistent. Structures that are not persistent are called ephemeral.[1]

These types of data structures are particularly common in logical and functional programming, and in a purely functional program all data is immutable, so all data structures are automatically fully persistent.[1] Persistent data structures can also be created using in-place updating of data and these may, in general, use less time or storage space than their purely functional counterparts.

While persistence can be achieved by simple copying, this is inefficient in time and space, because most operations make only small changes to a data structure. A better method is to exploit the similarity between the new and old versions to share structure between them, such as using the same subtree in a number of tree structures. However, because it rapidly becomes infeasible to determine how many previous versions share which parts of the structure, and because it is often desirable to discard old versions, this necessitates an environment with garbage collection.

Contents

Examples of persistent data structures

Perhaps the simplest persistent data structure is the singly linked list or cons-based list, a simple list of objects formed by each carrying a reference to the next in the list. This is persistent because we can take a tail of the list, meaning the last k items for some k, and add new nodes on to the front of it. The tail will not be duplicated, instead becoming shared between both the old list and the new list. So long as the contents of the tail are immutable, this sharing will be invisible to the program.

Many common reference-based data structures, such as red-black trees,[2] and queues,[3] can easily be adapted to create a persistent version. Some other like Stack, Double-ended queues (dequeue), Min-Dequeue (which have additional operation min returning minimal element in constant time without incurring additional complexity on standard operations of queuing and dequeuing on both ends), Random access list (with constant cons/head as single linked list, but with additional operation of random access with sub-linear, most often logarithmic, complexity), Random access queue, Random access double-ended queue and Random access stack (as well Random access Min-List, Min-Queue, Min-Dequeue, Min-Stack) needs slightly more effort.

There exists also persistent data structures which uses destructible operations (thus impossible to implement efficiently in the purely functional languages like Haskell, however possible in languages like C, Java), they are however not needed, as most data structures are currently available in pure versions which are often simpler to implement, and often behaves better in multi-threaded environments.


Linked lists

This example is taken from Okasaki. See the bibliography.

Singly linked lists are the bread-and-butter data structure in functional languages. In ML-derived languages and Haskell, they are purely functional because once a node in the list has been allocated, it cannot be modified, only copied or destroyed. Note that ML itself is not purely functional.

Consider the two lists:

xs = [0, 1, 2]
ys = [3, 4, 5]

These would be represented in memory by:

Purely functional list before.svg

where a circle indicates a node in the list (the arrow out showing the second element of the node which is a pointer to another node).

Now concatenating the two lists:

zs = xs ++ ys

results in the following memory structure:

Purely functional list after.svg

Notice that the nodes in list xs have been copied, but the nodes in ys are shared. As a result, the original lists (xs and ys) persist and have not been modified.

The reason for the copy is that the last node in xs (the node containing the original value 2) cannot be modified to point to the start of ys, because that would change the value of xs.

Trees

This example is taken from Okasaki. See the bibliography.

Consider a binary tree used for fast searching, where every node has the recursive invariant that subnodes on the left are less than the node, and subnodes on the right are greater than the node.

For instance, the set of data

xs = [a, b, c, d, f, g, h]

might be represented by the following binary search tree:

Purely functional tree before.svg

A function which inserts data into the binary tree and maintains the invariant is:

fun insert (x, E) = T (E, x, E)
  | insert (x, s as T (a, y, b)) =
       if x < y then T (insert (x, a), y, b)
       else if x > y then T (a, y, insert (x, b))
       else s

After executing

ys = insert ("e", xs)

we end up with the following:

Purely functional tree after.svg

Notice two points: Firstly the original tree (xs) persists. Secondly many common nodes are shared between the old tree and the new tree. Such persistence and sharing is difficult to manage without some form of garbage collection (GC) to automatically free up nodes which have no live references, and this is why GC is a feature commonly found in functional programming languages.

Reference cycles

Since every value in a purely functional computation is built up out of existing values, it would seem that it is impossible to create a cycle of references. In that case, the reference graph (the graph of the references from object to object) could only be a directed acyclic graph. However, in most functional languages, functions can be defined recursively; this capability allows recursive structures using functional suspensions. In lazy languages, such as Haskell, all data structures are represented as implicitly suspended thunks; in these languages any data structure can be recursive because a value can be defined in terms of itself. Some other languages, such as Objective Caml, allow the explicit definition of recursive values.

See also

References

Further reading

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Data structure — In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.[1][2] Different kinds of data structures are suited to different kinds of applications, and some are highly …   Wikipedia

  • Disjoint-set data structure — In computing, a disjoint set data structure is a data structure that keeps track of a set of elements partitioned into a number of disjoint (nonoverlapping) subsets. A union find algorithm is an algorithm that performs two useful operations on… …   Wikipedia

  • Persistent — may relate to: * Persistent Systems, a software company based in Pune, India * Persistent data structure, a data structure in computing which preserves the previous version of itself when it is modified * Persistent organic pollutant, an organic… …   Wikipedia

  • List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

  • Structure de donnees persistante — Structure de données persistante En informatique, une structure de données persistante est une structure de données qui préserve ses versions antérieures lorsqu elle est modifiée ; une telle structure est observationnellement immuable, car… …   Wikipédia en Français

  • Data virtualization — describes the process of abstracting disparate data sources (databases, applications, file repositories, websites, data services vendors, etc.) through a single data access layer (which may be any of several data access mechanisms). This… …   Wikipedia

  • Structure de données persistante — En informatique, une structure de données persistante est une structure de données qui préserve ses versions antérieures lorsqu elle est modifiée ; une telle structure est observationnellement immuable, car ses opérations ne la modifient pas …   Wikipédia en Français

  • 3D optical data storage — is the term given to any form of optical data storage in which information can be recorded and/or read with three dimensional resolution (as opposed to the two dimensional resolution afforded, for example, by CD). [ Three Dimensional Optical Data …   Wikipedia

  • Plain Old Data Structures — (PODS) are data structures that are represented only as passive collections of field values, without using encapsulation or other object oriented features.Plain Old Data Structures are appropriate when there is a part of a system where it should… …   Wikipedia

  • Topological data analysis — is a new area of study aimed at having applications in areas such as data mining and computer vision. The main problems are (1) how one infers high dimensionalstructure from low dimensional representations; and (2) how one assembles discrete… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”