- C character classification
-
C Standard Library - Data types
- Character classification
- Strings
- Mathematics
- File input/output
- Date/time
- Localization
- Memory allocation
- Program control
- Miscellaneous headers:
C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.[1]
Contents
History
Early toolsmiths writing in C under Unix began developing idioms at a rapid rate to classify characters into different types. For example, in the ASCII character set, the following test identifies a letter:
if ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z')
However, this idiom does not necessarily work for other character sets such as EBCDIC.
Pretty soon, programs became thick with tests such as the one above, or worse, tests almost like the one above. A programmer can write the same idiom several different ways, which slows comprehension and increases the chance for errors.
Before long, the idioms were replaced by the functions in
<ctype.h>
.Implementation
Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.
For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written thus:
#define isdigit(x) (TABLE[x] & 1)
Early versions of Linux used a potentially faulty method similar to the first code sample:
#define isdigit(x) ((x) >= '0' && (x) <= '9')
This can cause problems if
x
has a side effect---for instance, if one callsisdigit(x++)
orisdigit(run_some_program())
. It would not be immediately evident that the argument toisdigit
is being evaluated twice. For this reason, the table-based approach is generally used.The difference between these two methods became a point of interest during the SCO v. IBM case.[clarification needed]
Overview of functions
The functions that operate on single-byte characters are defined in
ctype.h
header (cctype
header in C++). The functions that operate on wide characters are defined inwctype.h
header (cwctype
header in C++).The classification is done according to the current locale.
- Single-byte character manipulation
isalnum
- checks if a character is alphanumericisalpha
- checks if a character is alphabeticislower
- checks if a character is lowercaseisupper
- checks if a character is an uppercase characterisdigit
- checks if a character is a digitisxdigit
- checks if a character is a hexadecimal characteriscntrl
- checks if a character is a control characterisgraph
- checks if a character is a graphical characterisspace
- checks if a character is a space characterisblank
- (C99/C++11) - checks if a character is a blank characterisprint
- checks if a character is a printing characterispunct
- checks if a character is a punctuation charactertolower
- converts a character to lowercasetoupper
- converts a character to uppercase
- Wide character manipulation
iswalnum
- checks if a wide character is alphanumericiswalpha
- checks if a wide character is alphabeticiswlower
- checks if a wide character is lowercaseiswupper
- checks if a wide character is an uppercase characteriswdigit
- checks if a wide character is a digitiswxdigit
- checks if a wide character is a hexadecimal characteriswcntrl
- checks if a wide character is a control characteriswgraph
- checks if a wide character is a graphical characteriswspace
- checks if a wide character is a space characteriswblank
- (C99/C++11) - checks if a wide character is a blank characteriswprint
- checks if a wide character is a printing characteriswpunct
- checks if a wide character is a punctuation charactertowlower
- converts a wide character to lowercasetowupper
- converts a wide character to uppercase
- Custom wide character manipulation
iswctype
- checks if a wide character falls into specific classtowctrans
- converts a wide character using a specific mappingwctype
- returns a character class to be used withiswctype
wctrans
- returns a transformation mapping to be used withtowctrans
References
External links
Categories:
Wikimedia Foundation. 2010.