- String interning
In computer science, string interning is a method of storing only one copy of each distinct string value, which must be
immutable . Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool.String interning is supported by some modern
object-oriented programming language s, including Python, Java and .NET languages.The single copy of each string is called its 'intern' and is typically looked up by a method of the string class, for example Javadoc:SE|java/lang|String|intern() in Java.
Interned strings speed up string comparisons, which are sometimes a performance bottleneck in applications (such as
compiler s anddynamic programming language runtimes) that rely heavily on hash tables with string keys. Without interning, checking that two different strings are equal involves examining every character of both strings. This is slow for several reasons: it is inherently O(n) in the length of the strings; it typically requires reads from several regions of memory, which take time; and the reads fills up the processor cache, meaning there is less cache available for other needs. With interned strings, a simple object identity test suffices after the original intern operation; this is typically implemented a pointer equality test, normally just a single machine instruction with no memory reference at all.String interning also reduces memory usage if there are many instances of the same string value; for instance, it is read from a network or from storage). Such strings may include magic numbers or network protocol information.
The intern pool in other programming languages
Even if the standard library of a programming language does not support string interning, it is possible to implement an intern pool in user code.
An example of such an implementation follows, in the
C++ programming language . Note that this implementation returns pointers to arrays of characters, not String objects.class StringTable {char** array; int* count; int size; int numStrings;
public:
StringTable() { size = 20; array = new char* [size] ; count = new int [size] ; numStrings = 0; }
int addString(const char* ch) { // this method adds a String to the pool and returns its index
if (numStrings = 0) { // if there is no string in the pool, add first array [0] = new char [strlen(ch)+1] ; strcpy(array [0] , ch); numStrings++; return 0; }
int i = 0;
for (i; i < numStrings; i++) { // compares, if the String is already in the pool if (strcmp(ch, array [i] ) = 0) { count [i] ++; return i; } }
if (i = size) { return -1; } array [i] = new char [strlen(ch)+1] ; // if not, add String to the pool and return its index strcpy(array [i] , ch); numStrings++; return numStrings-1; }
char* getString(int index) const {
return array [index] ; }
};
class String {int length; char* text; int index;
public:
static StringTable* table;
String(const char* ch) { length = strlen(ch); text = new char [length+1] ; strcpy(text,ch); index = table->addString(text); }
char* intern() { // return pointer to the String in pool if (index = -1) return 0; return table->getString(index); }
};
StringTable* String::table = new StringTable();
History
Lisp introduced the notion of interned strings for its atomic symbols or atoms. The data structure used as a string intern pool was called an 'oblist' (when it was implemented as a linked list) or an 'obarray' (when it was implemented as an array.
Other links
* [http://javatechniques.com/public/java/docs/basics/string-equality.html Java String equality and interning]
* [http://msdn2.microsoft.com/en-us/library/ms177906.aspx Visual J# String class]
* [http://msdn2.microsoft.com/en-us/library/system.string.intern.aspx .NET String Class]
Wikimedia Foundation. 2010.