Document-oriented database

Document-oriented database

A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information. Document-oriented databases are one of the main categories of so-called NoSQL databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL itself.

Contents

Documents

The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format(s) (or encoding(s)). Encodings in use include XML, YAML, JSON and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).

Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document:

FirstName="Bob", Address="5 Oak St.", Hobby="sailing".

Another document could be:

FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}].

Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it doesn't require explicitly stating if other pieces of information are left out.

Keys, Retrieval, and Organization

Keys

Documents are addressed in the database via a unique key that represents that document. Often, this key is a simple string. In some cases, this string is a URI or path. Regardless, you can use this key to retrieve the document from the database. Typically, the database retains an index on the key such that document retrieval is fast.

Retrieval

One of the other defining characteristics of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow you to retrieve documents based on their contents. For example, you may want a query that gets you all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next.

Organization

Implementations offer a variety of ways of organizing documents, including notions of

  • Collections
  • Tags
  • Non-visible Metadata
  • DIrectory hierarchies


Implementations

Name Publisher License Language Notes RESTful API
Lotus Notes IBM Proprietary (unknown)
askSam askSam Systems Proprietary (unknown)
Apstrata Apstrata Proprietary (unknown)
Datawasp Significant Data Systems Proprietary (unknown)
Clusterpoint Clusterpoint Ltd. Free community license / Commercial[1] C++ Scalable, high-performance, schema-free, document-oriented database management system platform with server based data storage, fast full text search engine functionality, information ranking for search revelevance and clustering. Yes
CRX Day Software Proprietary (unknown)
MUMPS Database[2] Proprietary and GNU Affero GPL[3] MUMPS Commonly used in health applications. (unknown)
UniVerse Rocket Software Proprietary Yes (Beta)
UniData Rocket Software Proprietary Yes (Beta)
Jackrabbit Apache Software Foundation Apache License Java (unknown)
CouchDB Couchbase, Apache Software Foundation Apache License Erlang JSON over REST/HTTP with Multi-Version Concurrency Control and ACID properties. Uses map and reduce for views and queries.[4] Yes (there is only RESTful API)[5]
FleetDB FleetDB MIT License Clojure A JSON-based schema-free database optimized for agile development. (unknown)
MongoDB GNU AGPL v3.0[6] C++ Fast, document-oriented database optimized for highly transient data. Optional using external tools[7]
GemFire Enterprise [1] VMWare Commercial Java, .NET, C++ Memory-oriented, fast, key-value database with indexing and querying support. Yes
OrientDB Orient Technologies Apache License Java JSON over HTTP Yes
RavenDB RavenDB commercial or GNU AGPL v3.0 .NET A .NET LINQ-enabled Document Database, focused on providing high performance, transactional, schema-less, flexible and scalable NoSQL data store for the .NET and Windows platforms. Yes
Redis BSD License ANSI C Key-value store supporting lists and sets with fast, simple and binary-safe protocol. (unknown)
StrokeDB [2] MIT License Alpha software. (unknown)
Terrastore Apache License Java JSON/HTTP (unknown)
ThruDB BSD License C++, Java Built on top of Apache Thrift framework that provides indexing and document storage services for building and scaling websites. Alternate implementation is being developed in Java. Alpha software. (unknown)
Persevere Persevere BSD License A JSON database and JavaScript Application Server. Provides RESTful JSON interface for Create, read, update, and delete access to data. Also supports JSONQuery/JSONPath querying. Yes
DBSlayer DBSlayer Apache License C database abstraction layer (over MySQL) used by the New York Times. JSON over HTTP. (unknown)
Eloquera DB Eloquera Proprietary .NET High performance. Based on Dynamic objects. Supports LINQ, SQL queries. (unknown)


XML database implementations

All XML databases are document-oriented databases.

See also

References

Further reading

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Database — A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports… …   Wikipedia

  • Database design — is the process of producing a detailed data model of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which… …   Wikipedia

  • Database administration and automation — Database administration is the function of managing and maintaining database management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft SQL Server need ongoing management. As such, corporations that use… …   Wikipedia

  • Database storage structures — Database tables/indexes are typically stored on hard disk in one of many forms, ordered/unordered Flat files, ISAM, Heaps, Hash buckets or B+ Trees. These have various advantages and disadvantages discussed in this topic. The most commonly used… …   Wikipedia

  • Database management system — A database management system (DBMS) is a software package with computer programs that control the creation, maintenance, and the use of a database. It allows organizations to conveniently develop databases for various applications by database… …   Wikipedia

  • Database model — A database model is the theoretical foundation of a database and fundamentally determines in which manner data can be stored, organized, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database… …   Wikipedia

  • Database system — A database system is a term that is typically used to encapsulate the constructs of a data model, database Management system (DBMS) and database.[1] A database is an organised pool of logically related data. Data is stored within the data… …   Wikipedia

  • Database normalization — In the design of a relational database management system (RDBMS), the process of organizing data to minimize redundancy is called normalization. The goal of database normalization is to decompose relations with anomalies in order to produce… …   Wikipedia

  • Database trigger — A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for keeping the integrity of the information on the database. For example,… …   Wikipedia

  • Database transaction — A transaction comprises a unit of work performed within a database management system (or similar system) against a database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database environment have …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”