Duplicate code

Duplicate code

Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons.[1] A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as clones.

The following are some of the ways in which two code sequences can be duplicates of each other:

  • character-for-character identical
  • character-for-character identical with white space characters and comments being ignored
  • token-for-token identical
  • token-for-token identical with occasional variation (i.e., insertion/deletion/modification of tokens)
  • functionally identical

Contents

How duplicates are created

There are a number of reasons why duplicate code may be created, including:

  • Copy and paste programming, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code such as renaming variables or inserting/deleting code.
  • Functionality that is very similar to that in another part of a program is required and a developer independently writes code that is very similar to what exists elsewhere.
  • Plagiarism, where code is simply copied without permission or attribution.

Problems associated with duplicate code

Code duplication is generally considered a mark of poor or lazy programming style. Good coding style is generally associated with code reuse. It may be slightly faster to develop by duplicating code, because the developer need not concern himself with how the code is already used or how it may be used in the future. The difficulty is that original development is only a small fraction of a product's life cycle, and with code duplication the maintenance costs are much higher. Some of the specific problems include:

  • Code bulk affects comprehension: Code duplication frequently creates long, repeated sections of code that differ in only a few lines or characters. The length of such routines can make it difficult to quickly understand them. This is in contrast to the "best practice" of code decomposition.
  • Purpose masking: The repetition of largely identical code sections can conceal how they differ from one another, and therefore, what the specific purpose of each code section is. Often, the only difference is in a parameter value. The best practice in such cases is a reusable subroutine.
  • Update anomalies: Duplicate code contradicts a fundamental principle of database theory that applies here: Avoid redundancy. Non-observance incurs update anomalies, which increase maintenance costs, in that any modification to a redundant piece of code must be made for each duplicate separately. At best, coding and testing time are multiplied by the number of duplications. At worst, some locations may be missed, and for example bugs thought to be fixed may persist in duplicated locations for months or years. The best practice here is a code library.

Detecting duplicate code

A number of different algorithms have been proposed to detect duplicate code. For example:

Example of functionally duplicate code

Consider the following code snippet for calculating the average of an array of integers

extern int array1[];
extern int array2[];
 
int sum1 = 0;
int sum2 = 0;
int average1 = 0;
int average2 = 0;
 
for (int i = 0; i < 4; i++)
{
   sum1 += array1[i];
}
average1 = sum1/4;
 
for (int i = 0; i < 4; i++)
{
   sum2 += array2[i];
}
average2 = sum2/4;

The two loops can be rewritten as the single function:

int calcAverage (int* Array_of_4)
{
   int sum = 0;
   for (int i = 0; i < 4; i++)
   {
       sum += Array_of_4[i];
   }
   return sum/4;
}

Using the above function will give source code that has no loop duplication:

extern int array1[];
extern int array2[];
 
int average1 = calcAverage(array1);
int average2 = calcAverage(array2);

Tools

Code duplication analysis tools include:

See also

References

  1. ^ Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. http://www.informit.com/articles/article.aspx?p=457502&seqNum=5. Retrieved 2008-06-06. 
  2. ^ Brenda S. Baker. A Program for Identifying Duplicated Code. Computing Science and Statistics, 24:49–57, 1992.
  3. ^ Ira D. Baxter, et al. Clone Detection Using Abstract Syntax Trees
  4. ^ Visual Detection of Duplicated Code by Matthias Rieger, Stephane Ducasse.
  5. ^ A Workbench for Clone Detection Research by E. Juergens, F. Deissenboeck, B. Hummel
  6. ^ Do Code Clones Matter? by E. Juergens, F. Deissenboeck, B. Hummel, S. Wagner
  7. ^ CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. by Zhenmin Li, Shan Lu, Suvda Myagmar and Yuanyuan Zhou.

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужно сделать НИР?

Look at other dictionaries:

  • Code page — is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM s EBCDIC based mainframe systems,[1] but many vendors use this term… …   Wikipedia

  • Code page 437 — Code page 437, as rendered by the IBM PC using a VGA adapter. IBM PC or MS DOS code page 437, often abbreviated CP437 and also known as DOS US, OEM US or sometimes misleadingly referred to as the OEM font, High ASCII or Extended ASCII,[1][2] is… …   Wikipedia

  • Code refactoring — Refactor redirects here. For the use of refactor on Wikipedia, see Wikipedia:Refactoring talk pages. Code refactoring is disciplined technique for restructuring an existing body of code, altering its internal structure without changing its… …   Wikipedia

  • Code talker — Codetalkers redirects here. For the band, see The Codetalkers. Choctaws in training in World War I for coded radio and telephone transmissions. Code talkers was a term used to describe people who talk using a coded language. It is frequently used …   Wikipedia

  • Code point — Not to be confused with Point code. In character encoding terminology, a code point or code position is any of the numerical values that make up the code space (or code page).[1] For example, ASCII comprises 128 code points in the range 0hex to… …   Wikipedia

  • Code wheel — A code wheel is a type of copy protection used on older computer games, often those published in the late 1980s and early 1990s. It evolved from the original manual protection system in which the program would require the user to enter a specific …   Wikipedia

  • Duplicate characters in Unicode — Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems. Unless two characters are canonically… …   Wikipedia

  • Redundant code — is a computer programming term for code that is executed but has no effect on the output of a program (dead code is the term applied to code that is never executed).Some developers also apply this term to what is essentially duplicate code. The… …   Wikipedia

  • Google Summer of Code — Google s Summer of Code concludes (first year), Bruce Byfield, September, 2005, webpage: [http://www.linux.com/articles/48232 Linux article SOC 32] .] Overview The program invites students who meet their eligibility criteria to post applications… …   Wikipedia

  • Laws of Duplicate Contract Bridge — is the official rule book of contract bridge promulgated by the World Bridge Federation. The first Laws of Duplicate Contract Bridge were published in 1928. They were successively revised in 1933, 1935, 1943, 1949, 1963, 1975, 1987, 1997 and 2007 …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”