Strlen

Strlen

In the C standard library, strlen is a string function that determines the length of a character string.

Example usage


#include
#include int main(void){ char *string = "Hello World"; printf("%lu ", (unsigned long)strlen(string)); return 0;}This program will print the value 11, which is the length of the string "Hello World".Character strings are stored in an array of a data type called char. The end of a string is found by searching for the first null character in the array.

Implementation

There are many possible implementations of strlen. The native functions are generally fast; but other methods do exist. The function is usually supplied from a library. Two things are worth keeping in mind though:

* Good compilers will often optimize calls to strlen() with constant strings arguments, doing the calculation at compile time.

* No optimization can make strlen() fast for significantly large inputs, so storing the length of the "C-strings" is generally the recommended fix.

Native

A possible implementation of strlen might be (FreeBSD 6.2):

size_t strlen(const char * str){ const char * s = str; for (; *s; ++s); return(s - str);}

Multi-Byte approach

This approach uses the trick of checking more bytes at once. The former approach uses just one-byte checking. Note that this approach is explained on unsigned values. Note that this approach checks groups of bytes by reading them into a variable. It is up to programmer to handle proper memory alignment and availability of size of that variable. Yet you may possibly read from invalid memory address on the last read. So this is not safe without solving those things (aligning allocated strings on searching window, which is done already by malloc, and length being multiply of it).

If there is a zero byte might be checked by understanding how numbers are represented in binary. Each character is on, lets presume 8 bits, byte. Certain number of bytes, lets presume 4, form a word. And that word will be the group we're checking for a 0'th byte.

What properties does this byte have? All bits set to 0. Others have some of bits set to 1. This is just that any number greater than 0 is not nul byte.What do we want to achieve? See if there is any zero byte, and if there is none we must have same answer for all possible variations of other non-nul bytes. This is just that we want some way to map those to same pattern or number, which if any byte is 0 will be broken.

If we were able to map all non-0 bytes to one value and 0 byte to other value it would be solved.

value1 = (value & 0x7f) + 0x7f; // value = (value & 127) + 127This code produces value >= 128 (0x80) for values from interval <1, 127> by mapping it to <1+127, 127+127>.

And (&) operation is used to exclude bit of value 128, for it causes numerical overflow. And this value is therefore not handled so far.But since we excluded this value we need to do something about it.We simply add it, for it is just another value which has to distinguished from 0 value. And therefore it has to be treated the same way.

value2 = value1 | (value & 128); // add 128 value bitvalue2 = value1 | value; // or just add all bits, including 128 value bit

Now we have made all values which are > 0 set bit value 128.

is_zero = (value2 & 128) = 128; // true or falseThis way we check whether the value was >= 0. Multibyte solution should be obvious.


#include size_t strlen (char *ptr){ uint32_t* ip; char* s;

// NOTE: this check is rarely done in C libraries. if (NULL = ptr) return ERROR_VALUE; //or 0

// Handle alignment here somehow if not handled within string. // String has to be aligned on integer boundary // or some architectures might behave unexpectedly! // NOTE: This is also required so that you don't overflow the // bounds of the memory by looking at bytes beyond the end of // the C-string. ip = (uint32_t *) ptr;

// multi-skip (less times around the loop) while (1) { /* check for all zero */ uint32_t x = *ip; /* check for not-all zero */ if (((((x & 0x7f7f7f7f) + 0x7f7f7f7f7f) | x) & 0x80808080) != 0x80808080) break; ++ip; /* moves forward 4 bytes at once */ }

// count any remaining bytes for (s = (char*)ip; 0 != *s; ++s) /* do nothing */ ;

return s - ptr;}

This can be found in HAKMEM.

Assembly

Sometimes the header files for a particular C library emit fast inline versions of strlen written in assembly. The compiler may also do this; or the header files may simply call compiler built-in versions.

Writing strlen in assembly is primarily done for speed. The complexity of code emitted from a compiler is often higher than hand-optimized assembly, even for very short functions. Further, a function call requires setting up a proper call frame on most implementations; these operations can outweigh the size of simple functions like strlen.

External links

*


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Strlen — функция стандартной библиотеки, для возврата длины нуль терминированной строки без символа окончания строки (нуля). Функция Прототип функции, описанный в заголовочном файле string.h size t strlen(const char* String); Пример использования #include …   Википедия

  • strlen — функция стандартной библиотеки, для возврата длины нуль терминированной строки без символа окончания строки (нуля). Функция Прототип функции, описанный в заголовочном файле string.h size t strlen(const char* String); Пример использования #include …   Википедия

  • Cyclone (programming language) — Cyclone Appeared in 2006 (2006) Designed by AT T Labs Stable release 1.0 (May 8, 2006; 5 years ago (2006 05 08)) Influenced by …   Wikipedia

  • Código cuenta cliente — El Código Cuenta Cliente (CCC) es un código utilizado en España por las entidades financieras (bancos y cajas) para la identificación de las cuentas de sus clientes. Consta de veinte dígitos. Contenido 1 Estructura del CCC 1.1 Dígitos de control… …   Wikipedia Español

  • TI-990 — The TI 990 was a series of 16 bit minicomputers sold by Texas Instruments (TI) in the 1970s and 1980s. The TI 990 was a replacement for TI s earlier minicomputer systems, the TI 960 and the TI 980. It had several uniquely innovative features, and …   Wikipedia

  • Pro*C — [pɹoʊˈsiː]/Pro*C++ [ ˈplʌs ˈplʌs] ist ein Precompiler des Unternehmens Oracle für die Programmiersprache C und C++. Mittels des Precompilers ist es möglich, SQL Ausdrücke und normale C oder C++ Quellcode Elemente miteinander zu vermischen. Dies… …   Deutsch Wikipedia

  • Кодирование длин серий — (англ. Run length encoding, RLE) или Кодирование повторов  простой алгоритм сжатия данных, который оперирует сериями данных, то есть последовательностями, в которых один и тот же символ встречается несколько раз подряд. При кодировании… …   Википедия

  • Approximations of π — Timeline of approximations for pi …   Wikipedia

  • String interning — In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time or space efficient at the cost of requiring more… …   Wikipedia

  • Copy-Konstruktor — Ein Kopierkonstruktor (auch Copy Konstruktor) ist in der Informatik ein spezieller Konstruktor, der eine Referenz auf ein Objekt desselben Typs als Parameter entgegennimmt und die Aufgabe hat, eine Kopie des Objektes zu erstellen.… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”