- CouchDB
-
Apache CouchDB
CouchDB's Futon Administration Interface, User databaseOriginal author(s) Damien Katz, Jan Lehnardt, Noah Slater, Christopher Lenz, J. Chris Anderson, Paul Davis, Adam Kocoloski, Jason Davies, Benoît Chesneau, Filipe Manana, Robert Newson Developer(s) Apache Software Foundation Initial release 2005 Preview release 1.1.1 / October 31, 2011 Development status Active Written in Erlang Operating system Cross-platform Available in English Type Document-oriented database License Apache License 2.0 Website couchdb.apache.org Apache CouchDB, commonly referred to as CouchDB, is an open source document-oriented database written mostly in the Erlang programming language. It is part of the NoSQL group of data stores and is designed for local replication and to scale horizontally across a wide range of devices. CouchDB is supported by commercial enterprises Couchbase and Cloudant.
Contents
History
In April 2005, Damien Katz (former Lotus Notes developer at IBM; now founder, CTO of Couchbase) posted on his blog about a new database engine he was working on. Details were sparse at this early stage, but what he did share was that it would be a "storage system for a large scale object database" and that it would be called CouchDB (Couch is an acronym for cluster of unreliable commodity hardware).[1] His objectives for the database were for it to become the database of the Internet and that it would be designed from the ground up to serve web applications. CouchDB was originally written in C++, but the project moved to the Erlang OTP platform for its emphasis on fault tolerance. He self-funded the project for almost two years and released it as an open source project under the GNU General Public License.
In February 2008, it became an Apache Incubator project and the license was changed to the Apache License rather than the GPL.[2] In November 2008, it graduated to a top-level project alongside the likes of the Apache HTTP Server, Tomcat and Ant.[3]
Currently, CouchDB is maintained at the Apache Software Foundation with backing from IBM. Katz works on it full-time as the lead developer.
Future plans
In 2011 Couchbase and SQLite announced the release into the public domain of a jointly developed UnQL, Unstructured Data Query Language, to make a standard query language for NoSQL databases similarly to the SQL that is standard in relational databases.[4]
Design
CouchDB is most similar to other document stores like MongoDB and Lotus Notes. It is not a relational database management system. Instead of storing data in rows and columns, the database manages a collection of JSON documents. The documents in a collection need not share a schema, but retain query abilities via views. Views are defined with aggregate functions and filters are computed in parallel, much like MapReduce.
Views are generally stored in the database and their indexes updated continuously, although queries may introduce temporary views. CouchDB supports a view system using external socket servers and a JSON-based protocol.[5] As a consequence, view servers have been developed in a variety of languages.
CouchDB exposes a RESTful HTTP API and a large number of pre-written clients are available. Additionally, a plugin architecture allows for using different computer languages as the view server such as JavaScript (default), PHP, Ruby, Python and Erlang. Support for other languages can be easily added. CouchDB design and philosophy borrows heavily from Web architecture and the concepts of resources, methods and representations and can be simplified as the following.
“ Django may be built for the Web, but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated. ” —Jacob Kaplan-Moss, Django Developer [6]
It is in use in many software projects and web sites[7], including Ubuntu, where it is used to synchronize address and bookmark data.[8] Since Version 0.11 CouchDB supports CommonJS' Module specification[9].
Features
- Document Storage
- CouchDB stores documents in their entirety. You can think of a document as one or more field/value pairs expressed as JSON. Field values can be simple things like strings, numbers, or dates. But you can also use ordered lists and associative maps. Every document in a CouchDB database has a unique id and there is no required document schema.
- ACID Semantics
- Like many relational database engines, CouchDB provides ACID semantics[10]. It does this by implementing a form of Multi-Version Concurrency Control (MVCC) not unlike InnoDB or Oracle. That means CouchDB can handle a high volume of concurrent readers and writers without conflict.
- Map/Reduce Views and Indexes
- To provide some structure to the data stored in CouchDB, you can develop views that are similar to their relational database counterparts. In CouchDB, each view is constructed by a JavaScript function (server-side JavaScript by using CommonJS and SpiderMonkey) that acts as the Map half of a map/reduce operation. The function takes a document and transforms it into a single value which it returns. The logic in your JavaScript functions can be arbitrarily complex. Since computing a view over a large database can be an expensive operation, CouchDB can index views and keep those indexes updated as documents are added, removed, or updated. This provides a very powerful indexing mechanism that grants unprecedented control compared to most databases.
- Distributed Architecture with Replication
- CouchDB was designed with bi-direction replication (or synchronization) and off-line operation in mind. That means multiple replicas can have their own copies of the same data, modify it, and then sync those changes at a later time. The biggest gotcha typically associated with this level of flexibility is conflicts.
- REST API
- CouchDB treats all stored items (there are others besides documents) as a resource. All items have a unique URI that gets exposed via HTTP. REST uses the HTTP methods POST, GET, PUT and DELETE for the four basic CRUD (Create, Read, Update, Delete) operations on all resources. HTTP is widely understood, interoperable, scalable and proven technology. A lot of tools, software and hardware, are available to do all sorts of things with HTTP like caching, proxying and load balancing.
- Eventual Consistency
- According to the CAP theorem it is impossible for a distributed system to simultaneously provide consistency, availability and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. CouchDB guarantees eventual consistency to be able to provide both availability and partition tolerance.
Examples
CouchDB provides a set of RESTful HTTP methods (e.g., POST, GET, PUT or DELETE) by using the cURL lightweight command-line tool to interact with CouchDB server:
curl http://127.0.0.1:5984/
The CouchDB server processes the HTTP request, it returns a response in JSON as the following:
{"couchdb":"Welcome","version":"1.1.0"}
This is not terribly useful, but it illustrates nicely the way of interacting with CouchDB. Creating a database is simple—just issue the following command:
curl -X PUT http://127.0.0.1:5984/wiki
CouchDB will reply with the following message, if the database does not exist:
{"ok":true}
or, with a different response message, if the database already exists:
{"error":"file_exists","reason":"The database could not be created, the file already exists."}
The command below retrieves information about the database:
curl -X GET http://127.0.0.1:5984/wiki
The server replies with the following JSON message:
{"db_name":"wiki","doc_count":0,"doc_del_count":0,"update_seq":0, "purge_seq":0,"compact_running":false,"disk_size":79, "instance_start_time":"1272453873691070","disk_format_version":5}
The following command will remove the database and its contents:
curl -X DELETE http://127.0.0.1:5984/wiki
CouchDB will reply with the following message:
{"ok":true}
Open source components
CouchDB includes a number of other open source projects as part of its default package.
Component Description License SpiderMonkey SpiderMonkey is a code name for the first ever JavaScript engine, written by Brendan Eich at Netscape Communications, later released as open source and now maintained by the Mozilla Foundation. MPL/GPL/LGPL tri-license jQuery jQuery is a lightweight cross-browser JavaScript library that emphasizes interaction between JavaScript and HTML. Dual license: GPL and MIT ICU International Components for Unicode (ICU) is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. MIT License OpenSSL OpenSSL is an open source implementation of the SSL and TLS protocols. The core library (written in the C programming language) implements the basic cryptographic functions and provides various utility functions. Apache-like unique Erlang Erlang is a general-purpose concurrent programming language and runtime system. The sequential subset of Erlang is a functional language, with strict evaluation, single assignment, and dynamic typing. Modified MPL See also
- Document-oriented database
- Lotus Notes
- MongoDB
- OrientDB
- Couchbase
- CouchApp
- Cassandra
- XML database
- Mnesia
- BrowserCouch
- Riak
References
- ^ Lennon, Joe (2009-03-31). "Exploring CouchDB". IBM. IBM. http://www.ibm.com/developerworks/opensource/library/os-couchdb/index.html. Retrieved 2009-03-31.
- ^ Apache mailing list announcement on mail-archives.apache.org
- ^ Re: Proposed Resolution: Establish CouchDB TLP on mail-archives.apache.org
- ^ "UnQL Query Language Unveiled by Couchbase and SQLite". Couchbase. 2011-07-29. http://www.couchbase.com/press-releases/unql-query-language. Retrieved 2011-10-05.
- ^ View Server Documentation on wiki.apache.org
- ^ A Different Way to Model Your Data
- ^ CouchDB in the wild A list of software projects and websites using CouchDB
- ^ Email from Elliot Murphy (Canonical) to the CouchDB-Devel list
- ^ http://wiki.apache.org/couchdb/CommonJS_Modules
- ^ [1], see section on ACID Properties.
Bibliography
- Anderson, J. Chris; Slater, Noah; Lehnardt, Jan (November 15, 2009), CouchDB: The Definitive Guide (1st ed.), O'Reilly Media, pp. 300, ISBN 0596158165, http://guide.couchdb.org/editions/1/en/index.html
- Lennon, Joe (December 15, 2009), Beginning CouchDB (1st ed.), Apress, pp. 300, ISBN 1430272376, http://www.apress.com/book/view/9781430272373
- Holt, Bradley (March 7, 2011), Writing and Querying MapReduce Views in CouchDB (1st ed.), O'Reilly Media, pp. 76, ISBN 1449303129, http://oreilly.com/catalog/0636920018247
- Holt, Bradley (April 11, 2011), Scaling CouchDB (1st ed.), O'Reilly Media, pp. 72, ISBN 1449303439, http://oreilly.com/catalog/9781449303433
- Brown, MC (October 31, 2011), Getting Started with CouchDB (1st ed.), O'Reilly Media, pp. 50, ISBN 1449307558, http://oreilly.com/catalog/9781449307554
- Thompson, Mick (August 2, 2011), Getting Started with GEO, CouchDB, and Node.js (1st ed.), O'Reilly Media, pp. 64, ISBN 1449307523, http://oreilly.com/catalog/9781449307523
External links
- Official website
- CouchDB: The Definitive Guide
- CouchDB articles on NoSQLDatabases.com
- CouchDB green paper
- CouchDB news and articles on myNoSQL
- Scaling CouchDB
- Complete HTTP API Reference
- Simple PHP5 library to communicate with CouchDB
Videos
- Erlang eXchange 2008: Couch DB at 10,000 feet Jan Lehnardt
- Jan Lehnardt is Giving the Following Talks, CouchDB for Erlang Developers
- CouchDB and Me on Jan 31, 2009 by Damien Katz
Apache Software Foundation Top level projects - Abdera
- ActiveMQ
- Ant
- Aries
- Apache HTTP Server
- APR
- Avro
- Axis
- Buildr
- Camel
- Cassandra
- Cayenne
- Chemistry
- Click
- Cocoon
- Continuum
- CouchDB
- CXF
- Derby
- Directory
- Felix
- Forrest
- Geronimo
- Gump
- Hadoop
- Hive
- HBase
- Jackrabbit
- James
- Karaf
- Lenya
- libcloud
- Mahout
- Maven
- MINA
- mod_perl
- MyFaces
- ODE
- OFBiz
- OpenEJB
- OpenJPA
- POI
- Pivot
- Qpid
- River
- Roller
- ServiceMix
- Shindig
- Shiro
- Sling
- SpamAssassin
- stdcxx
- Struts
- Subversion
- Tapestry
- Thrift
- Tomcat
- Trafficserver
- Tuscany
- UIMA
- Velocity
- Wicket
- Xerces
- XMLBeans
Jakarta Projects Commons Projects - Daemon
- Sanselan
- Jelly
Lucene Projects - Lucene Java
- Droids
- Lucene.Net
- Lucy
- Nutch
- Open Relevance Project
- PyLucene
- Solr
- Tika
Hadoop Projects - HDFS
- ZooKeeper
Other projects Incubator Projects - ACE
- Callback
- Composer
- Empire-db
- Hama
- JSPWiki
- OpenOffice.org
- XAP
- Wink
Apache Attic - License: Apache License
- Website: apache.org
Database management systems Concepts Objects - Relation (Table)
- View
- Transaction
- Log
- Trigger
- Index
- Stored procedure
- Cursor
- Partition
Components Database products:
Categories:- Database management systems
- Erlang programming language
- Open source database management systems
- Document-oriented databases
- Distributed computing architecture
- Structured storage
- NoSQL
- Unix network-related software
- Apache Software Foundation
- Free web server software
- Cross-platform software
Wikimedia Foundation. 2010.