C character classification

C character classification

C character classification is an operation provided by a group of functions in the ANSI C Standard Library for the C programming language. These functions are used to test characters for membership in a particular class of characters, such as alphabetic characters, control characters, etc. Both single-byte, and wide characters are supported.[1]

Contents

History

Early toolsmiths writing in C under Unix began developing idioms at a rapid rate to classify characters into different types. For example, in the ASCII character set, the following test identifies a letter:

if ('A' <= c && c <= 'Z' || 'a' <= c && c <= 'z')

However, this idiom does not necessarily work for other character sets such as EBCDIC.

Pretty soon, programs became thick with tests such as the one above, or worse, tests almost like the one above. A programmer can write the same idiom several different ways, which slows comprehension and increases the chance for errors.

Before long, the idioms were replaced by the functions in <ctype.h>.

Implementation

Unlike the above example, the character classification routines are not written as comparison tests. In most C libraries, they are written as static table lookups instead of macros or functions.

For example, an array of 256 eight-bit integers, arranged as bitfields, is created, where each bit corresponds to a particular property of the character, e.g., isdigit, isalpha. If the lowest-order bit of the integers corresponds to the isdigit property, the code could be written thus:

#define isdigit(x) (TABLE[x] & 1)

Early versions of Linux used a potentially faulty method similar to the first code sample:

#define isdigit(x) ((x) >= '0' && (x) <= '9')

This can cause problems if x has a side effect---for instance, if one calls isdigit(x++) or isdigit(run_some_program()). It would not be immediately evident that the argument to isdigit is being evaluated twice. For this reason, the table-based approach is generally used.

The difference between these two methods became a point of interest during the SCO v. IBM case.[clarification needed]

Overview of functions

The functions that operate on single-byte characters are defined in ctype.h header (cctype header in C++). The functions that operate on wide characters are defined in wctype.h header (cwctype header in C++).

The classification is done according to the current locale.

Single-byte character manipulation
  • isalnum - checks if a character is alphanumeric
  • isalpha - checks if a character is alphabetic
  • islower - checks if a character is lowercase
  • isupper - checks if a character is an uppercase character
  • isdigit - checks if a character is a digit
  • isxdigit - checks if a character is a hexadecimal character
  • iscntrl - checks if a character is a control character
  • isgraph - checks if a character is a graphical character
  • isspace - checks if a character is a space character
  • isblank - (C99/C++11) - checks if a character is a blank character
  • isprint - checks if a character is a printing character
  • ispunct - checks if a character is a punctuation character
  • tolower - converts a character to lowercase
  • toupper - converts a character to uppercase
Wide character manipulation
  • iswalnum - checks if a wide character is alphanumeric
  • iswalpha - checks if a wide character is alphabetic
  • iswlower - checks if a wide character is lowercase
  • iswupper - checks if a wide character is an uppercase character
  • iswdigit - checks if a wide character is a digit
  • iswxdigit - checks if a wide character is a hexadecimal character
  • iswcntrl - checks if a wide character is a control character
  • iswgraph - checks if a wide character is a graphical character
  • iswspace - checks if a wide character is a space character
  • iswblank - (C99/C++11) - checks if a wide character is a blank character
  • iswprint - checks if a wide character is a printing character
  • iswpunct - checks if a wide character is a punctuation character
  • towlower - converts a wide character to lowercase
  • towupper - converts a wide character to uppercase
Custom wide character manipulation
  • iswctype - checks if a wide character falls into specific class
  • towctrans - converts a wide character using a specific mapping
  • wctype - returns a character class to be used with iswctype
  • wctrans - returns a transformation mapping to be used with towctrans

References

External links


Wikimedia Foundation. 2010.

Игры ⚽ Поможем сделать НИР

Look at other dictionaries:

  • Chinese character classification — Chinese characters Scripts Precursors · Oracle bone script · Bronze script · Seal script (large, small) · Clerical script · Cu …   Wikipedia

  • Character — • A consideration of the term as it is used in psychology and ethics Catholic Encyclopedia. Kevin Knight. 2006. Character     Character     † …   Catholic encyclopedia

  • Character Is Destiny —   Author(s) John McCain, Mark Salter …   Wikipedia

  • Classification of the Functions of Government — (COFOG) is a classification defined by the United Nations Statistics Division. These functions are designed to be general enough to apply to the government of different countries. The accounts of each country in the United Nations are presented… …   Wikipedia

  • Character mask — Part of a series on Marxism …   Wikipedia

  • Character Strengths and Virtues (book) — The authors of the CSV mention the importance and power of strong, clear principles to any person The Character Strengths and Virtues (CSV) handbook of human strengths and virtues, by Christopher Peterson and Martin Seligman, represents the first …   Wikipedia

  • Character Strengths and Virtues (Book) — The Character Strengths and Virtues (CSV) handbook of human strengths and virtues, by the Values in Action Institute, represents the first attempt on the part of the research community to identify and classify the positive psychological traits of …   Wikipedia

  • Classification of finite simple groups — Group theory Group theory …   Wikipedia

  • Character theory — This article refers to the use of the term character theory in mathematics. For the media studies definition, see Character theory (Media). In mathematics, more specifically in group theory, the character of a group representation is a function… …   Wikipedia

  • Classification of the sciences (Peirce) — C. S. Peirce articles  General:    Charles Sanders Peirce Charles Sanders Peirce bibliography Philosophical:    Categories (Peirce) Semiotic elements and   classes of signs (Peirce) Pragmatic maxim • Pragmaticism… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”