- C syntax
The
syntax of the C programming language is a set of rules that specifies whether the sequence of characters in a file is conforming Csource code . The rules specify how the character sequences are to be chunked intotokens (thelexical grammar ), the permissible sequences of these tokens and some of the meaning to be attributed to these permissible token sequences (additional meaning is assigned by thesemantics of the language).C syntax makes use of the
maximal munch principle.Data structures
Primitive data types
The C language represents numbers in three forms: "integral", "real" and "complex". This distinction reflects similar distinctions in the
instruction set architecture of mostcentral processing unit s. "Integral" data types store numbers in the set ofintegers , while "real" and "complex" numbers represent numbers (or pair of numbers) in the set ofreal numbers infloating point form.All C integer types have
signed
andunsigned
variants. Ifsigned
orunsigned
is not specified explicitly, in most circumstancessigned
is assumed. However, for historic reasons plainchar
is a type distinct from bothsigned char
andunsigned char
. It may be a signed type or an unsigned type, depending on the compiler and the character set (C guarantees that members of the C basic character set have positive values). Also,bit field types specified as plainint
may be signed or unsigned, depending on the compiler.Integral types
The integral types come in different sizes, with varying amounts of memory usage and range of representable numbers. Modifiers are used to designate the size:
short
,long
andlong long
Thelong long
modifier was introduced in the C99 standard; some compilers had already supported it.] . The character type, whose specifier ischar
, represents the smallest addressable storage unit, which is most often an 8-bit byte (its size must be at least 7-bit to store the basic character set, or larger) The standard headerlimits.h defines the minimum and maximum values of the integral primitive data types, amongst other limits.The following table provides a list of the integral types and their "typical" storage sizes and acceptable ranges of values, which may vary from one compiler and platform to another.
ISO C defines different limits in section 5.2.4.2.1 of the standard. For integral types of "guaranteed" sizes, the standard provides thestdint.h header.The use of other backslash escapes is not defined by the C standard.
tring literal concatenation
Adjacent string literals are concatenated at compile time; this allows long strings to be split over multiple lines, and also allows string literals resulting from
C preprocessor defines and macros to be appended to strings at compile time:will expand towhich is syntactically equivalent toCharacter constants
Individual character constants are represented by single-quotes, e.g.
'A'
, and have typeint
(notchar
). The difference is that"A"
represents a pointer to the first element of a null-terminated array, whereas'A'
directly represents the code value (65 if ASCII is used). The same backslash-escapes are supported as for strings, except that (of course)"
can validly be used as a character without being escaped, whereas'
must now be escaped. A character constant cannot be empty (i.e.
is invalid syntax), although a string may be (it still has the null terminating character). Multi-character constants (e.g." 'xy'
) are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into oneint
is not specified, portable use of multi-character constants is difficult.Wide character strings
Since type
char
is usually 1 byte wide, a singlechar
value typically can represent at most 255 distinct character codes, not nearly enough for all the characters in use worldwide. To provide better support for international characters, the first C standard (C89) introducedwide character s (encoded in typewchar_t
) and wide character strings, which are written asL"Hello world!"
Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as
UTF-16 ) or 4 bytes (usuallyUTF-32 ), but Standard C does not specify the width forwchar_t
, leaving the choice to the implementor.Microsoft Windows generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; theUnix world prefers UTF-32, thus compilers such as GCC would generate a 52-byte string. A 2-byte widewchar_t
suffers the same limitation aschar
, in that certain characters (those outside the BMP) cannot be represented in a singlewchar_t
; but must be represented usingsurrogate pair s.The original C standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for
char
strings. The relevant functions are mostly named after theirchar
equivalents, with the addition of a "w" or the replacement of "str" with "wcs"; they are specified in<wchar.h>
, with<wctype.h>
containing wide-character classification and mapping functions.Variable width strings
A common alternative to
wchar_t
is to use avariable-width encoding , whereby a logical character may extend over multiple positions of the string. Variable-width strings may be encoded into literals verbatim, at the risk of confusing the compiler, or using numerical backslash escapes (e.g."xc3xa9"
for "é" in UTF-8). TheUTF-8 encoding was specifically designed (under Plan 9) for compatibility with the standard library string functions; supporting features of the encoding include a lack of embedded nulls, no valid interpretations for subsequences, and trivial resynchronisation. Encodings lacking these features are likely to prove incompatible with the standard library functions; encoding-aware string functions are often used in such case.Library functions
strings, both constant and variable, may be manipulated without using the
standard library . However, the library contains many useful functions for working with null-terminated strings. It is the programmer's responsibility to ensure that enough storage has been allocated to hold the resulting strings.The most commonly used string functions are:
*
strcat(dest, source)
- appends the stringsource
to the end of stringdest
*strchr(s, c)
- finds the first instance of characterc
in strings
and returns a pointer to it or a null pointer ifc
is not found
*strcmp(a, b)
- compares stringsa
andb
(lexicographical order ing); returns negative ifa
is less thanb
, 0 if equal, positive if greater.
*strcpy(dest, source)
- copies the stringsource
onto the stringdest
*strlen(st)
- return the length of stringst
*strncat(dest, source, n)
- appends a maximum ofn
characters from the stringsource
to the end of stringdest
and null terminates the string at the end of input or at indexn+1
when the max length is reached
*strncmp(a, b, n)
- compares a maximum ofn
characters from stringsa
andb
(lexical ordering); returns negative ifa
is less thanb
, 0 if equal, positive if greater
*strrchr(s, c)
- finds the last instance of characterc
in strings
and returns a pointer to it or a null pointer ifc
is not foundOther standard string functions include:
*
strcoll(s1, s2)
- compare two strings according to a locale-specific collating sequence
*strcspn(s1, s2)
- returns the index of the first character ins1
that matches any character ins2
*strerror(errno)
- returns a string with an error message corresponding to the code inerrno
*strncpy(dest, source, n)
- copiesn
characters from the stringsource
onto the stringdest
, substituting null bytes once past the end ofsource
; does not null terminate if max length is reached
*strpbrk(s1, s2)
- returns a pointer to the first character ins1
that matches any character ins2
or a null pointer if not found
*strspn(s1, s2)
- returns the index of the first character ins1
that matches no character ins2
*strstr(st, subst)
- returns a pointer to the first occurrence of the stringsubst
inst
or a null pointer if no such substring exists
*strtok(s1, s2)
- returns a pointer to a token withins1
delimited by the characters ins2
*strxfrm(s1, s2, n)
- transformss2
ontos1
, such thats1
used withstrcmp
gives the same results ass2
used withstrcoll
There is a similar set of functions for handling wide character strings.
tructures and unions
tructures
Structures in C are defined as data containers consisting of a sequence of named members of various types. They are similar to records in other programming languages. The members of a structure are stored in consecutive locations in memory, although the compiler is allowed to insert padding between or after members (but not before the first member) for efficiency. The size of a structure is equal to the sum of the sizes of its members, plus the size of the padding.
Unions
Unions in C are related to structures and are defined as objects that may hold (at different times) objects of different types and sizes. They are analogous to variant records in other programming languages. Unlike structures, the components of a union all refer to the same location in memory. In this way, a union can be used at various times to hold different types of objects, without the need to create a separate object for each new type. The size of a union is equal to the size of its largest component type.
Declaration
Structures are declared with the
struct
keyword and unions are declared with theunion
keyword. The specifier keyword is followed by an optional identifier name, which is used to identify the form of the structure or union. The identifier is followed by the declaration of the structure or union's body: a list of member declarations, contained within curly braces, with each declaration terminated by a semicolon. Finally, the declaration concludes with an optional list of identifier names, which are declared as instances of the structure or union.For example, the following statement declares a structure named
s
that contains three members; it will also declare an instance of the structure known ast
:And the following statement will declare a similar union named
u
and an instance of it namedn
:Once a structure or union body has been declared and given a name, it can be considered a new data type using the specifier
struct
orunion
, as appropriate, and the name. For example, the following statement, given the above structure declaration, declares a new instance of the structures
namedr
:It is also common to use the
typedef
specifier to eliminate the need for thestruct
orunion
keyword in later references to the structure. The first identifier after the body of the structure is taken as the new name for the structure type. For example, the following statement will declare a new type known ass_type
that will contain some structure:Future statements can then use the specifier
s_type
(instead of the expandedstruct …
specifier) to refer to the structure.Accessing members
Members are accessed using the name of the instance of a structure or union, a period (
.
), and the name of the member. For example, given the declaration oft
from above, the member known asy
(of typefloat
) can be accessed using the following syntax:Structures are commonly accessed through pointers. Consider the following example that defines a pointer to
t
, known asptr_to_t
:Member
y
oft
can then be accessed by dereferencingptr_to_t
and using the result as the left operand:Which is identical to the simpler
t.y
above as long asptr_to_t
points tot
. Because this operation is common, C provides an abbreviated syntax for accessing a member directly from a pointer. With this syntax, the name of the instance is replaced with the name of the pointer and the period is replaced with the character sequence->
. Thus, the following method of accessingy
is identical to the previous two:Members of unions are accessed in the same way.
Initialization
A structure can be initialized in its declarations using an initializer list, similar to arrays. If a structure is not initialized, the values of its members are undefined until assigned. The components of the initializer list must agree, in type and number, with the components of the structure itself.
The following statement will initialize a new instance of the structure
s
from above known aspi
:C99 introduces a more flexible initialization syntax for structures, which allows members to be initialized by name. The following initialization using this syntax is equivalent to the previous one. Initialization using this syntax may initialize members in any order:In C89, a union can only be initialized with a value of the type of its first member. That is, the union
u
from above can only be initialized with a value of typeint
.In C99, any one member of a union may be initialized using the new syntax described above.
Assignment
Assigning values to individual members of structures and unions is syntactically identical to assigning values to any other object. The only difference is that the "lvalue" of the assignment is the name of the member, as accessed by the syntax mentioned above.
A structure can also be assigned as a unit to another structure of the same type. Structures (and pointers to structures) may also be used as function parameter and return types.
For example, the following statement assigns the value of
74
(the ASCII code point for the letter 't') to the member namedx
in the structuret
, from above:And the same assignment, using
ptr_to_t
in place oft
, would look like:Assignment with members of unions is identical, except that each new assignment changes the current type of the union, and the previous type and value are lost.
Other operations
According to the C standard, the only legal operations that can be performed on a structure are copying it, assigning to it as a unit (or initializing it), taking its address with the address-of (
&
) unary operator, and accessing its members. Unions have the same restrictions. One of the operations implicitly forbidden is comparison: structures and unions cannot be compared using C's standard comparison facilities (=
,>
,<
, etc.).Bit fields
C also provides a special type of structure member known as a
bit field , which is an integer with an explicitly specified number of bits. A bit field is declared as a structure member of typeint
,signed int
,unsigned int
, or (in C99 only)_Bool
, following the member name by a colon (:
) and the number of bits it should occupy. The total number of bits in a single bit field must not exceed the total number of bits in its declared type.As a special exception to the usual C syntax rules, it is implementation-defined whether a bit field declared as type
int
, without specifying
signed
orunsigned
, is signed or unsigned. Thus, it is recommended to explicitly specifysigned
orunsigned
on all structure members for portability.Empty entries consisting of just a colon followed by a number of bits are also allowed; these indicate padding.
The members of bit fields do not have addresses, and as such cannot be used with the address-of (
&
) unary operator. Thesizeof
operator may not be applied to bit fields.The following declaration declares a new structure type known as
f
and an instance of it known asg
. Comments provide a description of each of the members:Incomplete types
The body of a
struct
orunion
declaration, or atypedef
thereof, may be omitted, yielding an "incomplete type". Such a type may not be instantiated (its size is not known), nor may its members be accessed (they, too, are unknown); however, the derived pointer type may be used (but not dereferenced).Incomplete types are used to implement
recursive structures; the body of the type declaration may be deferred to later in the translation unit:Incomplete types are also used for
data hiding ; the incomplete type is defined in aheader file , and the body only within the relevant source file.Operators
:"Main article:
Operators in C and C++ "Control structures
C is a free-form language.
Bracing style varies from programmer to programmer and can be the subject of debate ("flame wars"). SeeIndent style for more details.Compound statements
In the items in this section, any
can be replaced with a compound statement. In C89, compound statements in C have the form:and are used as the body of a function or anywhere that a single statement is expected. The declaration-list declares variables to be used in that scope, and the statement-list are the actions to be performed. Brackets define their own scope, and variables defined inside those brackets will be automatically deallocated at the closing bracket. C99 extends this syntax to allow declarations and statements to be freely intermixed within a compound statement (as does C++ ).election statements
C has two types of
selection statement s: theif
statement and theswitch
statement.The
if
statement is in the form:In the
if
statement, if thein parentheses is nonzero (true), control passes to . If the else
clause is present and theis zero (false), control will pass to . The "else part is optional, and if absent, a false will simply result in skipping over the . An else
always matches the nearest previous unmatchedif
; braces may be used to override this when necessary, or for clarity.The
switch
statement causes control to be transferred to one of several statements depending on the value of an expression, which must haveintegral type . The substatement controlled by a switch is typically compound. Any statement within the substatement may be labeled with one or morecase
labels, which consist of the keywordcase
followed by a constant expression and then a colon (:). The syntax is as follows:No two of the case constants associated with the same switch may have the same value. There may be at most one
default
label associated with a switch - if none of the case labels are equal to the expression in the parentheses followingswitch
, control passes to thedefault
label, or if there is nodefault
label, execution resumes just beyond the entire construct.Switches may be nested; acase
ordefault
label is associated with the innermost switch that contains it. Switch statements can "fall through", that is, when one case section has completed its execution, statements will continue to be executed downward until abreak;
statement is encountered. Fall-through is useful in some circumstances, but is usually not desired.In the preceding example, ifis reached, the statements are executed and nothing more inside the braces. However if is reached, both and are executed since there is no break
to separate the two case statements.Iteration statements
C has three forms of
iteration statement :In the
while
anddo
statements, the substatement is executed repeatedly so long as the value of the expression remains nonzero (true). Withwhile
, the test, including all side effects from the expression, occurs before each execution of the statement; withdo
, the test follows eachiteration . Thus, ado
statement always executes its substatement at least once, whereaswhile
may not execute the substatement at all.If all three expressions are present in a
for
, the statementis equivalent toexcept for the behavior of acontinue;
statement (which in the for loop jumps toe3
instead ofe2
).Any of the three expressions in the
for
loop may be omitted. A missing second expression makes thewhile
test always nonzero, creating a potentially infinite loop.C99 generalizes the
for
loop by allowing the first expression to take the form of a declaration (typically including an initializer). The declaration's scope is limited to the extent of thefor
loop.Jump statements
Jump statements transfer control unconditionally. There are four types of jump statements in C:
goto
,continue
,break
, andreturn
.The
goto
statement looks like this:The
identifier must be a label (followed by a colon) located in the current function. Control transfers to the labeled statement.A
continue
statement may appear only within aniteration statement and causes control to pass to the loop-continuation portion of the innermost enclosing iteration statement. That is, within each of the statementsa
continue
not contained within a nested iteration statement is the same asgoto cont
.The
break
statement is used to end afor
loop,while
loop,do
loop, orswitch
statement. Control passes to the statement following the terminated statement.A function returns to its caller by the
return
statement. Whenreturn
is followed by an expression, the value is returned to the caller as the value of the function. Encountering the end of the function is equivalent to areturn
with no expression. In that case, if the function is declared as returning a value and the caller tries to use the returned value, the result is undefined.toring the address of a label
GCC extends the C language with a unary
&&
operator that returns the address of a label. This address can be stored in a void* variable type and may be used later in a goto instruction. For example, the following prints "hi " in an infinite loop:This feature can be used to implement a
jump table .Functions
yntax
A C function definition consists of a
return type (void
if no value is returned), a unique name, a list of parameters in parentheses (void
if there are none), and various statements. A function with non-void
return type should include at least onereturn
statement:where
ofn
variables is declared as data type and variable name separated by a comma:It is possible to define a function as taking a variable number of parameters by providing the
...
keyword asthe last parameter instead of a data type and variable name. A commonly used function that does this is the standard library functionprintf
, which has the declaration:Manipulation of these parameters can be done by using the routines in the standard library header
.Function Pointers
A pointer to a function can be declared as follows:
The following program shows use of a function pointer for selecting between addition and subtraction:
Global structure
After preprocessing, at the highest level a C program consists of a sequence of declarations at file scope. These may be partitioned into several separate source files, which may be compiled separately; the resulting object modules are then linked along with implementation-provided run-time support modules to produce an executable image.
The declarations introduce functions,
variable s andtype s. C functions are akin to the subroutines of Fortran or the procedures of Pascal.A definition is a special type of declaration. A variable definition sets aside storage and possibly initializes it, a function definition provides its body.
An implementation of C providing all of the standard library functions is called a "hosted implementation". Programs written for hosted implementations are required to define a special function called
main
, which is the first function called when execution of the program begins.Hosted implementations of C start program execution by invoking the
main
function, which must be defined in a fashion compatible with one of the following prototypes:In particular, the function
main
must be declared as having anint
return type according to theC Standard . The C standard defines return values 0 andEXIT_SUCCESS
as indicating success andEXIT_FAILURE
as indicating failure. (EXIT_SUCCESS
andEXIT_FAILURE
are defined in<stdlib.h>
). Other return values have implementation defined meanings; for example, underLinux a program killed by asignal yields a return code of the numerical value of the signal plus 128.Here is a minimal C program:
The
main
function will usually call other functions to help it perform its job.Some implementations are not hosted, usually because they are not intended to be used with an
operating system . Such implementations are called "free-standing" in the C standard. A free-standing implementation is free to specify how it handles program startup; in particular it need not require a program to define amain
function.Functions may be written by the programmer or provided by existing libraries. Interfaces for the latter are usually declared by including header files—with the
#include
preprocessing directive —and the library objects are linked into the final executable image. Certain library functions, such as
, are defined by the C standard; these are referred to as the standard library functions.printf A function may return a value to the environment that called it. This is usually another C function; however, the calling environment of the
main
function is the parent process inUnix-like systems or theoperating system itself in other cases. By definition, the return value zero (or the value of theEXIT_SUCCESS
macro) frommain
signifies successful completion of the program. (There is also anEXIT_FAILURE
macro to signify failure.) Theprintf
function mentioned above returns how many characters were printed, but this value is often ignored.Argument passing
In C, arguments are passed to functions by value while other languages may pass variables by reference.This means that the receiving function gets copies of the values and has no direct way of altering the original variables.For a function to alter a variable passed from another function, the caller must pass its "address" (a "pointer" to it), which can then be dereferenced in the receiving function (see Pointers for more info):
The function
scanf works the same way:In order to pass an editable pointer to a function you have to pass a pointer to "that" pointer; its address:
int **p
defines a pointer to a pointer, which is the address to the pointerp
in this case.Array parameters
Function parameters of array type may at first glance appear to be an exception to C's pass-by-value rule. The following program will print 2, not 1:
However, there is a different reason for this behavior. In fact, a function parameter declared with an array type is treated almost exactly like one declared to be a pointer. That is, the preceding declaration of
setArray
is equivalent to the following:At the same time, C rules for the use of arrays in expressions cause the value of
a
in the call tosetArray
to be converted to a pointer to the first element of arraya
. Thus, in fact this is still an example of pass-by-value, with the caveat that it is the address of the first element of the array being passed by value, not the contents of the array.Miscellaneous
Reserved keywords
The following words are reserved, and may not be used as identifiers:
:auto:_Bool:break:case:char:_Complex:const:continue:default:do:double:else:enum:extern:float:for:goto:if:_Imaginary:inline:int:long:register:restrict:return:short:signed:sizeof:static:struct:switch:typedef:union:unsigned:void:volatile:while
Implementations may reserve other keywords, such as asm, although implementations typically provide non-standard keywords that begin with one or two underscores.
Case sensitivity
C identifiers are case sensitive (e.g.,
foo
,FOO
, andFoo
are the names of different objects). Some linkers may map external identifiers to a single case, although this is uncommon in most modern linkers.Comments
Text starting with
/*
is treated as a comment and ignored. The comment ends at the next*/
; it can occur within expressions, and can span multiple lines. Accidental omission of the comment terminator is problematic in that the next comment's properly constructed comment terminator will be used to terminate the initial comment, and all code in between the comments will be considered as a comment. C-style comments do not "nest".The
C99 standard introducedC++ style line comments. These start with//
and extend to the end of the line.Command-line arguments
The
parameter s given on acommand line are passed to a C program with two predefined variables - the count of the command-line arguments inargc
and the individual arguments ascharacter string s in the pointer arrayargv
.So the command myFilt p1 p2 p3results in something like(Note: While individual strings are contiguous arrays of
char
, there is noguarantee that the strings are stored as a contiguous group.)The name of the program,
argv [0]
, may be useful when printing diagnostic messages. The individual values of the parameters may be accessed withargv [1]
,argv [2]
, andargv [3]
, as shown in the following program:Evaluation order
In any reasonably complex expression, there arises a choice as to the order in which to evaluate the parts of the expression:
(1+1)+(3+3)
may be evaluated in the order(1+1)+(3+3) → (2)+(3+3) → (2)+(6) → 8
or in the order(1+1)+(3+3) → (1+1)+(6) → (2)+(6) → 8
. Formally, a conforming C compiler may evaluate expressions in "any" order between "sequence point s". Sequence points are defined by:
*"Statement ends" at semicolons.
*The "sequencing operator": a comma. However, commas that delimit function arguments are not sequence points.
*The "short-circuit operators": logical "and" (&&
) and logical "or" (||
).
*The "conditional operator" (?:
): This operator evaluates its first sub-expression first, and then its second or third (never both of them) based on the value of the first.
*Entry to and exit from a "function call" (but not between evaluations of the arguments).Expressions before a sequence point are always evaluated before those after a sequence point. In the case of short-circuit evaluation, the second expression may not be evaluated depending on the result of the first expression. For example, in the expression
(a() || b())
, if the first argument evaluates to nonzero (true), the result of the entire expression will also be true, sob()
is not evaluated.The arguments to a function call may be evaulated in any order, as long as they are all evaluated by the time the function call takes place. The following expression, for example, has undefined behaviour:
Undefined behavior
An aspect of the C standard (not unique to C) is that the behavior of certain code is said to be "undefined". In practice, this means that the program produced from this code can do anything, from working as the programmer intended, to crashing every time it is run.
For example, the following code produces undefined behavior, because the variable
b
is modified more than once with no intervening sequence point:Because there is no sequence point between the modifications of
b
inb++ + b++
, it is possible to perform the evaluation steps in more than one order, resulting in an ambiguous statement. This can be fixed by rewriting the code to insert a sequence point:ee also
*C programming language
*C variable types and declarations
*Operators in C and C++References
*cite book |last=Kernighan |first=Brian W. |authorlink= |coauthors=Ritchie, Dennis M. |title=The C Programming Language |edition=2nd Edition |year=1988 |publisher=Prentice Hall PTR |location=Upper Saddle River, New Jersey |isbn=0131103709
* American National Standard for Information Systems - Programming Language - C - ANSI X3.159-1989External links
* [http://www.math.grin.edu/~stone/courses/languages/C-syntax.xhtml "The syntax of C in Backus-Naur form"]
* [http://www.cs.cf.ac.uk/Dave/C/CE.html Programming in C]
* [http://c-faq.com/ "The comp.lang.c Frequently Asked Questions Page"]
Wikimedia Foundation. 2010.