- Strlen
In the
C standard library , strlen is a string function that determines the length of a character string.Example usage
This program will print the value 11, which is the length of the string "Hello World".Character strings are stored in an array of a data type called char. The end of a string is found by searching for the first
null character in the array.Implementation
There are many possible implementations of strlen. The native functions are generally fast; but other methods do exist. The function is usually supplied from a library. Two things are worth keeping in mind though:
* Good compilers will often optimize calls to strlen() with constant strings arguments, doing the calculation at compile time.
* No optimization can make strlen() fast for significantly large inputs, so storing the length of the "C-strings" is generally the recommended fix.
Native
A possible implementation of strlen might be (FreeBSD 6.2):
Multi-Byte approach
This approach uses the trick of checking more bytes at once. The former approach uses just one-byte checking. Note that this approach is explained on unsigned values. Note that this approach checks groups of bytes by reading them into a variable. It is up to programmer to handle proper memory alignment and availability of size of that variable. Yet you may possibly read from invalid memory address on the last read. So this is not safe without solving those things (aligning allocated strings on searching window, which is done already by malloc, and length being multiply of it).
If there is a zero byte might be checked by understanding how numbers are represented in binary. Each character is on, lets presume 8 bits, byte. Certain number of bytes, lets presume 4, form a word. And that word will be the group we're checking for a 0'th byte.
What properties does this byte have? All bits set to 0. Others have some of bits set to 1. This is just that any number greater than 0 is not nul byte.What do we want to achieve? See if there is any zero byte, and if there is none we must have same answer for all possible variations of other non-nul bytes. This is just that we want some way to map those to same pattern or number, which if any byte is 0 will be broken.
If we were able to map all non-0 bytes to one value and 0 byte to other value it would be solved.
This code produces value >= 128 (0x80) for values from interval <1, 127> by mapping it to <1+127, 127+127>.
And (&) operation is used to exclude bit of value 128, for it causes numerical overflow. And this value is therefore not handled so far.But since we excluded this value we need to do something about it.We simply add it, for it is just another value which has to distinguished from 0 value. And therefore it has to be treated the same way.
Now we have made all values which are > 0 set bit value 128.
This way we check whether the value was >= 0. Multibyte solution should be obvious.
This can be found in
HAKMEM .Assembly
Sometimes the
header files for a particular C library emit fast inline versions of strlen written in assembly. The compiler may also do this; or the header files may simply call compiler built-in versions.Writing strlen in assembly is primarily done for speed. The complexity of code emitted from a compiler is often higher than hand-optimized assembly, even for very short functions. Further, a function call requires setting up a proper call frame on most implementations; these operations can outweigh the size of simple functions like strlen.
External links
*
Wikimedia Foundation. 2010.