Surrogate key

Surrogate key

A surrogate key in a database is a unique identifier for either an "entity" in the modeled world or an "object" in the database. The surrogate key is "not" derived from application data.

Definition

Two definitions of a surrogate appear in the literature:

; Surrogate (1) : This definition is based on that given by Hall, Owlett and Todd (1976). Here a surrogate represents an "entity" in the outside world. The surrogate is internally generated by the system but is nevertheless visible by the user or application.; Surrogate (2) : This definition is based on that given by Wieringa and de Jung (1991). Here a surrogate represents an "object" in the database itself. The surrogate is internally generated by the system and is invisible to the user or application.

The "surrogate (1)" definition defines its usage in the data model rather than the storage model and is used in this article. See Date (1998).

An important distinction exists between a surrogate and a primary key, depending on whether the database is a current database or a temporal database. A "current database" stores only "currently" valid data, therefore there is a one-to-one correspondence between a surrogate in the modelled world and the primary key of some object in the database; in this case the surrogate may be used as a primary key, resulting in the term "surrogate key". However, in a temporal database there is a many-to-one relationship between primary keys and the surrogate. Since there may be several objects in the database corresponding to a single surrogate, we cannot use the surrogate as a primary key; another attribute is required, in addition to the surrogate, to uniquely identify each object.

Although Hall et alia (1976) say nothing about this, "other" authors have argued that a surrogate should have the following constraints:

* the value is unique system-wide, hence never reused;
* the value is system generated;
* the value is not manipulable by the user or application;
* the value contains no semantic meaning;
* the value is not visible to the user or application;
* the value is not composed of several values from different domains.

Surrogates in practice

In a current database, the surrogate key can be the primary key, generated by the database management system and "not" derived from any application data in the database. The only significance of the surrogate key is to act as the primary key. It is also possible that the surrogate key exists in addition to the database-generated uuid, e.g. a HR number for each employee besides the UUID of each employee.

A surrogate key is frequently a sequential number (e.g. a Sybase or SQL Server "identity column", a PostgreSQL serial, an Oracle SEQUENCE or a column defined with AUTO_INCREMENT in MySQL) but doesn't have to be. Having the key independent of all other columns insulates the database relationships from changes in data values or database design (making the database more agile) and guarantees uniqueness.

In a temporal database, it is necessary to distinguish between the surrogate key and the primary key. Typically, every row would have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the surrogate key identifies the unique entity in the modelled world; these two keys are not the same. For example, table "Staff" may contain two rows for "John Smith", one row when he was employed between 1990 and 1999, another row when he was employed between 2001 and 2006. The surrogate key is identical (non-unique) in both rows however the primary key "will" be unique.

Some database designers use surrogate keys religiously regardless of the suitability of other candidate keys, while others will use a key already present in the data, if there is one.

A "surrogate" may also be called a

* surrogate key,
* entity identifier,
* system-generated key,
* database sequence number,
* synthetic key,
* technical key, or
* arbitrary unique identifier.

Some of these terms describe the way of "generating" new surrogate values rather than the "nature" of the surrogate concept.

Here are some possible candidates for generating surrogates:

* Universally Unique Identifiers (UUIDs)
* Globally Unique Identifiers (GUIDs)
* Object Identifiers (OIDs)
* Sybase or SQL Server identity column
* Oracle SEQUENCE
* PostgreSQL serial
* MySQL AUTO_INCREMENT
* AutoNumber data type in Microsoft Access

Advantages of Surrogate Keys

Immutability

Surrogate keys do not change while the row exists. This has two advantages:

* Database applications won't lose their "handle" on the row because the data changes;
* Many database systems do not support cascading updates of keys across foreign keys of related tables. This results in difficulty in modifying the primary key data.

Flexibility for changing requirements

Because of changing requirements, the attributes that uniquely identify an entity might change.In that case, the attribute(s) initially chosen as the natural key will no longer be a suitable natural key.

:"Example":::An employee ID is chosen as the natural key of an employee DB. Because of a merger with:another company, new employees from the merged company must be inserted, who have:conflicting IDs (as their IDs were independently generated when the companies were:separate).

In these cases, generally a new attribute must be added to the natural key (e.g. an attribute "original_company").With a surrogate key, only the table that defines the surrogate key must be changed. With natural keys, all tables (and possibly other, related software) that use the natural key will have to change.

More generally, in some problem domains it is simply not clear what might be a suitable natural key. Surrogate keys avoid problems from choosing a natural key that later turns out to be incorrect.

Performance

Often surrogate keys are composed of a compact data type, such as a four-byte integer. This allows the database to query faster than it could multiple columns.

*A non-redundant distribution of keys causes the resulting b-tree index to be completely balanced.
*If the natural key is a compound key, joining is more expensive as there are multiple columns to compare. Surrogate keys are always contained in a single column.

Compatibility

Several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate (Java), depend on the use of integer or GUID surrogate keys in order to support database-system-agnostic operations and object-to-row mapping.

Disadvantages of Surrogate Keys

Disassociation

Because the surrogate key is completely unrelated to the data of the row to which it is attached, the key is disassociated from that row. Disassociated keys are unnatural to the application's world, resulting in an additional level of indirection from which to audit.

Query Optimization

Relational databases assume a unique index is applied to a table's primary key. The unique index serves two purposes: 1) to enforce entity integrity—primary key data must be unique across rows—and 2) to quickly search for rows queried. Since surrogate keys replace a table's identifying attributes—the natural key—and since the identifying attributes are likely to be those queried, then the query optimizer is forced to perform a full table scan when fulfilling likely queries. The remedy to the full table scan is to apply a (non-unique) index on each of the identifying attributes. However, these additional indexes will take up disk space, slow down inserts, and slow down deletes.

Normalization

The presence of a surrogate key can result in the database administrator forgetting to establish, or accidentally removing, a secondary unique index on the natural key of the table. Without a unique index on the natural key, duplicate rows are likely to appear and are difficult to identify.

Business Process Modeling

Because surrogate keys are unnatural, flaws can appear when modeling the business requirements. Business requirements, relying on the natural key, then need to be translated to the surrogate key.

Inadvertent Disclosure

Proprietary information may be leaked if sequential key generators are used. By subtracting a previously generated sequential key from a recently generated sequential key, one could learn the number of rows inserted during that time period. This could expose, for example, the number of transactions or new accounts per period. The solution to the inadvertent disclosure problem is to generate a random primary key. However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection.

Inadvertent Assumptions

Sequentially generated surrogate keys create the illusion that events with a higher primary key value occurred after events with a lower primary key value. This illusion would appear when an event is missed during the normal data entry process and is, instead, inserted after subsequent events were previously inserted. The solution to the inadvertent assumption problem is to generate a random primary key. However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection.

ee also

* Natural key
* Object identifier

References

* cite book
last = Nijssen
first = G.M.
title = Modelling in Data Base Management Systems
publisher = North-Holland Pub. Co.
date = 1976
id = ISBN 0-7204-0459-2

* Engles, R.W.: (1972), "A Tutorial on Data-Base Organization", Annual Review in Automatic Programming, Vol.7, Part 1, Pergamon Press, Oxford, pp. 1–64.

* Langefors, B: (1968), "Elementary Files and Elementary File Records", Proceedings of File 68, an IFIP/IAG International Seminar on File Organisation, Amsterdam, November, pp. 89–96.

* [http://citeseer.ist.psu.edu/cache/papers/cs/3493/ftp:zSzzSzftp.cs.vu.nlzSzpubzSzroelwzSz92-ObjectsRoles.pdf/wieringa91identification.pdf The identification of objects and roles: Object identifiers revisited] by Wieringa and de Jung (1991).

* [http://www.amazon.co.uk/dp/0201398141 Relational Database Writings 1994–1997] by C.J. Date (1998), Chapters 11 and 12.

* cite web
last = Carter
first = Breck
title = Intelligent Versus Surrogate Keys
url = http://www.bcarter.com/intsurr1.htm
accessdate = 2006-12-03

* cite web
last = Richardson
first = Lee
title = Create Data Disaster: Avoid Unique Indexes – (Mistake 3 of 10)
url = http://www.nearinfinity.com/blogs/page/lrichard?entry=create_data_disaster_avoid_unique
accessdate = 2008-19-01

* cite web
last = Berkus
first = Josh
title = Database Soup: Primary Keyvil, Part I
url = http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-i-7327
accessdate = 2006-12-03


Wikimedia Foundation. 2010.

Игры ⚽ Нужно решить контрольную?

Look at other dictionaries:

  • Surrogate Key — Ein Surrogatschlüssel (Stellvertreterschlüssel, engl. surrogate key, wörtlich: „Ersatzschlüssel“) ist ein Datenbankschlüssel in einer Datenbanktabelle. Eine wichtige Besonderheit dieses Schlüssels ist, dass er nicht aus den Daten in der Tabelle… …   Deutsch Wikipedia

  • Surrogate — may refer to:Cultural relationships: * Surrogate pregnancy, an arrangement for a woman to carry and give birth to a child who will be raised by others * Sex surrogate, in sexual therapy * Surrogate marriage, a custom in African cultureIn the arts …   Wikipedia

  • Unique key — In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same… …   Wikipedia

  • Natural key — In relational model database design, a natural key is a candidate key that has a logical relationship to the attributes within that row. A natural key is sometimes called a domain key. The main advantage of a natural key over a surrogate key,… …   Wikipedia

  • Foreign Key — Ein Schlüssel dient in einer Relationalen Datenbank dazu, die Tupel einer Relation eindeutig zu identifizieren. Anschaulich kann man sich eine Relation als Tabelle vorstellen. Ein Schlüssel für eine solche Tabelle ist dann eine Gruppe von Spalten …   Deutsch Wikipedia

  • Primary Key — Ein Schlüssel dient in einer Relationalen Datenbank dazu, die Tupel einer Relation eindeutig zu identifizieren. Anschaulich kann man sich eine Relation als Tabelle vorstellen. Ein Schlüssel für eine solche Tabelle ist dann eine Gruppe von Spalten …   Deutsch Wikipedia

  • Primary key — Ein Schlüssel dient in einer Relationalen Datenbank dazu, die Tupel einer Relation eindeutig zu identifizieren. Anschaulich kann man sich eine Relation als Tabelle vorstellen. Ein Schlüssel für eine solche Tabelle ist dann eine Gruppe von Spalten …   Deutsch Wikipedia

  • Compound key — In database design, a compound key is a key that consists of 2 or more attributes that uniquely identify an entity occurrence. Each attribute that makes up the compound key is a simple key in its own right. This is often confused with a composite …   Wikipedia

  • Sex surrogate — A sex surrogate is a member of a sex therapy team who engages in intimate physical or sexual relations with a patient in order to achieve a therapeutic goal. The practice was introduced by Masters and Johnson with their work on Human Sexual… …   Wikipedia

  • Data Vault Modeling — is a database modeling method that is designed to provide historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that, apart from the modeling aspect, deals with issues such as… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”