thumb|300px|A snippet of C code from the [[Linux kernel]]
C syntax is the form that text must have in order to be C programming language code. The language syntax rules are designed to allow for code that is terse, has a close relationship with the resulting object code, and yet provides relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development. C syntax makes use of the maximal munch principle.
As a free-form language, C code can be formatted different ways without affecting its syntactic nature.
C syntax influenced the syntax of succeeding languages, including C++, Java, and C#.
High level structure
C code consists of preprocessor directives, and core-language types, variables and functions, organized as one or more source files. Building the code typically involves preprocessing and then compiling each source file into an object file. Then, the object files are linked to create an executable image.
Variables and functions can be declared separately from their definition. A declaration identifies the name of a user-defined element and some if not all of the information about how the element can be used at run-time. A definition is a complete description of an element that includes the declaration aspect as well as additional information that completes the element. For example, a function declaration indicates the name and optionally the type and number of arguments that it accepts. A function definition includes the same information (argument information is not optional), plus code that implements the function logic.
Entry point
thumb|300px|C code for a program that prints "Hello, World!"
For a hosted environment, a program starts at an entry point function named . The function is passed two arguments although an implementation of the function can ignore them. The function must be declared per one of the following prototypes (parameter names shown are typical but can be anything):
<syntaxhighlight lang=C>
int main();
int main(void);
int main(int argc, char* argv[]);
int main(int argc, char** argv);
</syntaxhighlight>
The first two definitions are equivalent, meaning that the function does not use the two arguments. The second two are also equivalent, allowing the function to access the two arguments.
The return value, typed as , serves as a status indicator to the host environment. Defined in stdlib.h|, the standard library provides macros for standard status values: and . Regardless, a program can indicate status using any values. For example, the <code>kill</code> command returns the numerical value of the signal plus 128.
A minimal program consists of a parameter less, empty function, like:
<syntaxhighlight lang=C>
int main() {}
</syntaxhighlight>
Unlike other functions, the language requires that a program act as if it returns 0 even if it does not end with a statement. (and in practice before; in versions before C23 the representation might alternatively have been ones' complement, or sign-and-magnitude, but in practice that has not been the case for decades on modern hardware). In many cases, there are multiple equivalent ways to designate the type; for example, and are synonymous.
The representation of some types may include unused "padding" bits, which occupy storage but are not included in the width. The following table lists the integer types using the shortest possible name and indicating the minimum width in bits.
{| class="wikitable"
|+ Standard integer types
|-
! Name !! Minimum<br>width<br>(bits)
|-
|
| style="text-align: center" | 1
|-
|
| style="text-align: center" | 8
|-
|
| style="text-align: center" | 8
|-
|
| style="text-align: center" | 8
|-
|
| style="text-align: center" | 16
|-
|
| style="text-align: center" | 16
|-
|
| style="text-align: center" | 16
|-
|
| style="text-align: center" | 16
|-
|
| style="text-align: center" | 32
|-
|
| style="text-align: center" | 32
|-
|
| style="text-align: center" | 64
|-
| -->
: NOTE C does not specify a radix for float, double, and long double. An implementation can choose the representation of float, double, and long double to be the same as the decimal floating types.
Despite that, the radix has historically been binary (base 2), meaning numbers like 1/2 or 1/4 are exact, but not 1/10, 1/100 or 1/3. With decimal floating point all the same numbers are exact plus numbers like 1/10 and 1/100, but still not e.g. 1/3. No known implementation does opt into the decimal radix for the previously known to be binary types. Since most computers do not even have the hardware for the decimal types, and those few that do (e.g. IBM mainframes since IBM System z10), can use the explicitly decimal types.
<!-- Lots more keywords added in C23 (also for the preprocessor), see at https://en.cppreference.com/w/c/keyword such as _BitInt, typeof and thread_local as opposed to older _Thread_local.
NOT mentioned there, only at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2601.pdf :
X.5.1 Keywords
_Float32x
_Float64x
_Float128x
..
X.5.2 Constants
[1] This subclause specifies constants of interchange and extended floating types.
[2] This subclause expands floating-suffix (6.4.4.2) to also include:
fN FN fNx FNx dN DN dNx DNx
..
The type specifiers _FloatN (where N is 16, 32, 64, or ≥ 128 and a multiple of 32),
_Float32x, _Float64x, _Float128x, _DecimalN (where N is 96 or > 128 and a multiple of
32), _Decimal64x, and _Decimal128x shall not be used if the implementation does not
support the corresponding types (see 6.10.8.3 and X.2).
..
— _DecimalN, where N is 96 or > 128 and a multiple of 32
— _Decimal64x
— _Decimal128x
— _FloatN _Complex, where N is 16, 32, 64, or ≥ 128 and a multiple of 32
-->
Storage class
The following table describes the specifiers that define various storage attributes including duration static (default for global), automatic (default for local), or dynamic (allocated).
{| class="wikitable"
|+ Storage classes
|-
! Specifier
! Lifetime
! Scope
! Default initializer
|-
|
| Block (stack)
| Block
| Uninitialized
|-
|
| Block (stack or CPU register)
| Block
| Uninitialized
|-
|
| Program
| Block or compilation unit
| Zero
|-
|
| Program
| Global (entire program)
| Zero
|-
|
| Thread
|
|
|-
| (none)<sup>1</sup>
| Dynamic (heap)
|
| Uninitialized (initialized to if using )
|}
:<sup>1</sup> Allocated and deallocated using the and library functions.
Variables declared within a block by default have automatic storage, as do those explicitly declared with the Automatic variable| or Register (keyword)| storage class specifiers. The and specifiers may only be used within functions and function argument declarations; as such, the specifier is always redundant. Objects declared outside of all blocks and those explicitly declared with the Static variable| storage class specifier have static storage duration. Static Variables are initialized to zero by default by the compiler.
Objects with automatic storage are local to the block in which they were declared and are discarded when the block is exited. Additionally, objects declared with the storage class may be given higher priority by the compiler for access to registers, although the compiler may choose not to actually store any of them in a register. Objects with this storage class may not be used with the address-of () unary operator. Objects with static storage persist for the program's entire duration. In this way, the same object can be accessed by a function across multiple calls. Objects with allocated storage duration are created and destroyed explicitly with malloc|, , and related functions.
The External variable| storage class specifier indicates that the storage for an object has been defined elsewhere. When used inside a block, it indicates that the storage has been defined by a declaration outside of that block. When used outside of all blocks, it indicates that the storage has been defined outside of the compilation unit. The storage class specifier is redundant when used on a function declaration. It indicates that the declared function has been defined outside of the compilation unit.
The Thread-local storage| (<code>_Thread_local</code> before C23, and in earlier versions of C if the header <code><threads.h></code> is included) storage class specifier, introduced in C11, is used to declare a thread-local variable. It can be combined with or to determine linkage.
Note that storage specifiers apply only to functions and objects; other things such as type and enum declarations are private to the compilation unit in which they appear. Types, on the other hand, have qualifiers (see below).
Since C23, C can use <code>auto</code> to declare a type-inferred variable.
Type qualifiers
Types can be qualified to indicate special properties of their data. The type qualifier <code>const</code> indicates that a value does not change once it has been initialized. Attempting to modify a <code>const</code> qualified value yields undefined behavior, so some compilers store them in rodata or (for embedded systems) in read-only memory (ROM). Similarly, <code>constexpr</code> can be thought of as a "stronger" form of <code>const</code>, where the value must be known at compile time (making it a type-safe replacement for macro constants). A <code>constexpr</code> function, similarly, must also be able to be evaluated at compile time. The type qualifier <code>volatile</code> indicates to an optimizing compiler that it may not remove apparently redundant reads or writes, as the value may change even if it was not modified by any expression or statement, or multiple writes may be necessary, such as for memory-mapped I/O.
Incomplete types
An incomplete type is a structure or union type whose members have not yet been specified, an array type whose dimension has not yet been specified, or the type (the type cannot be completed). Such a type may not be instantiated (its size is not known), nor may its members be accessed (they, too, are unknown); however, the derived pointer type may be used (but not dereferenced).
They are often used with pointers, either as forward or external declarations. For instance, code could declare an incomplete type like this:
<syntaxhighlight lang=C>
struct Integer* pt;
</syntaxhighlight>
This declares as a pointer to (as well as the incomplete struct type). As all pointers have the same size (regardless of what they point to), code can use as a pointer although it cannot access the fields of .
An incomplete type can be completed later in the same scope by redeclaring it. For example:
<syntaxhighlight lang=C>
struct Integer {
int num;
};
</syntaxhighlight>
Incomplete types are used to implement recursive structures; the body of the type declaration may be deferred to later in the translation unit:
<syntaxhighlight lang=C>
typedef struct Bert Bert;
typedef struct Wilma Wilma;
struct Bert {
Wilma* wilma;
};
struct Wilma {
Bert* bert;
};
</syntaxhighlight>
Incomplete types are also used for data hiding. The incomplete type is defined in a header file, and the full definition is hidden in a single body file.
Pointers
In a variable declaration, the asterisk () can be considered to mean "pointer-to". For example, defines a variable of type int, and defines a variable that is a pointer to integer. Some contend that based on the language definition, the is more closely related to the variable than the type and therefore format the code as or even .
A pointer value associates two pieces of information: a memory address and a data type.
Referencing
When a non-static pointer is declared, it has an unspecified value. Dereferencing it without first assigning it, results in undefined behavior.
The operator specifies the address of the data object after it. In the following example, <code>ptr</code> is assigned the address of <code>a</code>:
<syntaxhighlight lang=C>
int a = 0;
int* ptr = &a;
</syntaxhighlight>
Dereferencing
An asterisk before a variable name (when not in a declaration or a mathematical expression) dereferences a pointer to allow access to the value it points to. In the following example, the integer variable <code>b</code> is set to the value of integer variable <code>a</code>, which is <code>10</code>:
<syntaxhighlight lang=C>
int a = 10;
int* p;
p = &a;
int b = *p;
</syntaxhighlight>
Arrays
Array definition
Arrays store consecutive elements of the same type. The following code declares an array of 100 elements, named <code>a</code>, of type <code>int</code>.
<syntaxhighlight lang=C>int a[100];</syntaxhighlight>
If declared outside of a function (globally), the size must be a constant value. If declared in a function, the array size may be a non-constant expression.
The number of elements is available as , but if the value is passed to another function, the number of elements is not available via the formal parameter variable.
Accessing elements
The primary facility for accessing array elements is the array subscript operator. For example, accesses the element at index <code>i</code> of array <code>a</code>. Array indexing begins at 0, making the last array index equal to the number of elements minus 1. As the standard does not provide for array indexing bounds checking, specifying an index that is out of range, results in undefined behavior.
Due to arrays and pointers being interchangeable, the address of each elements can be expressed in pointer arithmetic. The following table illustrates both methods for the existing the same array:
{| class="wikitable" style="margin-left: auto; margin-right: auto; text-align: center"
|+ Array subscripts vs. pointer arithmetic
! style="text-align: left" | Element
! First
! Second
! Third
! nth
|-
! style="text-align: left" | Array subscript
|
|
|
|
|-
! style="text-align: left" | Dereferenced pointer
|
|
|
|
|}
Since the expression is semantically equivalent to , which in turn is equivalent to , the expression can also be written as , although this form is rarely used.
Variable-length arrays
C99 standardized the variable-length array (VLA) in block scope that produced an array sized by runtime information (not a constant value) but with fixed size until the end of the block.
! style="text-align: left" | Element
! First
! Second row, second column
! ith row, jth column
|-
! style="text-align: left" | Array subscript
|
|
|
|-
! style="text-align: left" | Dereferenced pointer
|
|
|
|}
Higher-dimensional arrays can be declared in a similar manner.
A multidimensional array should not be confused with an array of pointers to arrays (also known as an Iliffe vector or sometimes an array of arrays). The former is always rectangular (all subarrays must be the same size), and occupies a contiguous region of memory. The latter is a one-dimensional array of pointers, each of which may point to the first element of a subarray in a different place in memory, and the sub-arrays do not have to be the same size.
Text
Although the language provides types for textual character data, neither the language nor the standard library defines a string type, but the null terminated string is commonly used. A string value is a contiguous series of characters with the end denoted by a zero value. The standard library contains many string handling functions for null-terminated strings, but string manipulation can and often is handled via custom code.
String literal
A string literal is code text surrounded by double quotes, such as . A literal compiles to an array of the specified values with a terminating null terminating character to mark the end of the string.
The language supports string literal concatenation adjacent string literals are treated as joined at compile time. This allows long strings to be split over multiple lines, and also allows string literals from preprocessor macros to be appended to strings at compile time. For example, the source code:
<syntaxhighlight lang=C>
printf(__FILE__ ": %d: Hello "
"world\n");
</syntaxhighlight>
becomes the following after the preprocessor expands :
<syntaxhighlight lang=C>
printf("helloworld.c" ": %d: Hello "
"world\n");
</syntaxhighlight>
which is equivalent to:
<syntaxhighlight lang=C>
printf("helloworld.c: %d: Hello world\n");
</syntaxhighlight>
Character constants
The character literal, called character constant, is single-quoted, e.g. , and has type . To illustrate the difference between a string literal and a character constant, consider that is two characters, 'A' and '\0', whereas represents a single character (65 in ASCII).
A character constant cannot be empty (i.e. is invalid syntax). Multi-character constants (e.g. ) are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into an is not specified (left to the implementation to define), portable use of multi-character constants is difficult.
Nevertheless, in situations limited to a specific platform and the compiler implementation, multicharacter constants do find their use in specifying signatures. One common use case is the OSType, where the combination of Classic Mac OS compilers and its inherent big-endianness means that bytes in the integer appear in the exact order of characters defined in the literal. The definition by popular "implementations" are in fact consistent: in GCC, Clang, and Visual C++, yields <code>0x31323334</code> under ASCII.
Like string literals, character constants can also be modified by prefixes, for example has type and represents the character value of "A" in the wide character encoding.
Backslash escapes
Control characters cannot be included in a string or character literal directly. Instead they can be encoded via an escape sequence starting with a backslash (). For example, the backslashes in indicate that the inner pair of quotes are intended as an actual part of the string, rather than the default reading as a delimiter (endpoint) of the string.
Escape sequences include:
{| class="wikitable"
! align="left" |Sequence
! align="left" |Meaning
|-
| || Literal backslash
|-
| || Double quote
|-
| || Single quote
|-
| || Newline (line feed)
|-
| || Carriage return
|-
| || Backspace
|-
| || Horizontal tab
|-
| || Form feed
|-
| || Alert (bell)
|-
| || Vertical tab
|-
| || Question mark (used to escape trigraphs, obsolete feature dropped in C23)
|-
| <code>\OOO</code> || Character with octal value OOO (where OOO is 1-3 octal digits, '0'-'7')
|-
| <code>\xhh</code> || Character with hexadecimal value hh (where hh is 1 or more hex digits, '0'-'9','A'-'F','a'-'f')
|-
| <code>\uhhhh</code> || Unicode code point below 10000 hexadecimal (added in C99)
|-
| <code>\Uhhhhhhhh</code> || Unicode code point where hhhhhhhh is eight hexadecimal digits (added in C99)
|}
The use of other backslash escapes is not defined by the standard, although compilers often provide additional escape codes as language extensions. For example, the escape sequence <code>\e</code> for the escape character with ASCII hex value 1B which was not added to the standard due to lacking representation in other character sets (such as EBCDIC). It is available in GCC, clang and tcc.
Note that the standard library function uses to represent the literal character.
Wide character strings
Since type is 1 byte wide, a single value typically can represent at most 255 distinct character codes, not nearly enough for all the different characters in use worldwide. To provide better support for international characters, the first standard (C89) introduced wide characters (encoded in type ) and wide character strings, which are written as
Wide characters are most commonly either 2 bytes (using a 2-byte encoding such as UTF-16) or 4 bytes (usually UTF-32), but Standard C does not specify the width for , leaving the choice to the implementor. Microsoft Windows generally uses UTF-16, thus the above string would be 26 bytes long for a Microsoft compiler; the Unix world prefers UTF-32<!-- dubious?! See also new in C23: char8_t type for storing UTF-8 encoded data -->, thus compilers such as GCC would generate a 52-byte string. A 2-byte wide suffers the same limitation as , in that certain characters (those outside the BMP) cannot be represented in a single , and must be represented using surrogate pairs.
The original standard specified only minimal functions for operating with wide character strings; in 1995 the standard was modified to include much more extensive support, comparable to that for strings. The relevant functions are mostly named after their equivalents, with the addition of a "w" or the replacement of "str" with "wcs"; they are specified in , with containing wide-character classification and mapping functions.
The now generally recommended method of supporting international characters is through UTF-8, which is stored in arrays, and can be written directly in the source code if using a UTF-8 editor, because UTF-8 is a direct ASCII extension.
Variable width strings
A common alternative to is to use a variable-width encoding, whereby a logical character may extend over multiple positions of the string. Variable-width strings may be encoded into literals verbatim, at the risk of confusing the compiler, or using numerical backslash escapes (e.g. for "" in UTF-8). The UTF-8 encoding was specifically designed (under Plan 9) for compatibility with the standard library string functions; supporting features of the encoding include a lack of embedded nulls, a lack of valid interpretations for subsequences, and trivial resynchronisation. Encodings lacking these features are likely to prove incompatible with the standard library functions; encoding-aware string functions are often used in such cases.
Structure
A structure or struct is a container consisting of a sequence of named members of heterogeneous types, similar to a record in other languages. The first field starts at the address of the structure and the members are stored in consecutive locations in memory, but the compiler can insert padding between or after members for efficiency or as padding required for proper alignment by the target architecture. The size of a structure includes padding.
A structure is declared with the struct (C programming language)| keyword followed by an optional identifier name, which is used to identify the form of the structure. The body follows with field declarations that each consist of a type name, a field name and terminated with a semi-colon.
The following declares a structure named that contains three members. It also declares an instance named :
<syntaxhighlight lang=C>
struct MyStruct {
int x;
float y;
char* z;
} tee;
</syntaxhighlight>
Structure members cannot have an incomplete or function type. Thus members cannot be an instance of the structure being declared (because it is incomplete at that point) but a field can be a pointer to the type being declared.
Once declared, a variable can be declared of the structure type.
The following declares a new instance of the structure named :
<syntaxhighlight lang=C>struct MyStruct r;</syntaxhighlight>
Although some prefer to declare a struct variable using the keyword, some use <code>typedef</code> to alias the struct type into the main type namespace. The following declares a type as <code>Integer</code> which can then be used like .
<syntaxhighlight lang=C>
typedef struct {
int i;
} Integer;
</syntaxhighlight>
Accessing members
A member is accessed using dot notation. For example, given the declaration of tee from above, the member y can be accessed as .
A structure is commonly accessed via a pointer. Consider that defines a pointer to tee, named ptee. Member y of tee can be accessed by dereferencing ptee and using the result as the left operand as <code>(*ptee).y</code>. Because this operation is common, the language provides an abbreviated syntax for accessing a member directly from a pointer, (for example, <code>ptee->y</code>).
Assignment
Assigning a value to a member is like assigning a value to a variable. The only difference is that the lvalue (left side value) of the assignment is the name of the member according to the above syntax.
A structure can also be assigned as a whole to another structure of the same type, passed by copy as a function argument or return value. For example, <code>tee.x = 74</code> assigns the value 74 to the member named x in the structure tee, And, <code>ptee->x = 74</code> does the same for ptee.
Other operations
The operations supported for a structure are: initialize, copy, get address and access a field. Of note, the language does not support comparing the value of two structures other than via custom code to compare each field.
Bit fields
The language provides a special type of member known as a bit field, which is an integer with a specified size in bits. A bit field is declared as a member of type (signed/unsigned) , or <!-- Add bool and _BitInt(N) when updating the page for C23 -->, plus a suffix after the member name consisting of a colon and a number of bits. The total number of bits in a single bit field must not exceed the total number of bits of its base type. Contrary to the usual C syntax rules, it is implementation-defined whether a bit field is signed or unsigned if not explicitly specified. Therefore, best practice is to specify or .
Unnamed fields indicate padding and consist of just a colon followed by a number of bits. Specifying a width of zero for an unnamed field is used to force alignment to a new word. Since all members of a union occupy the same memory, unnamed bit-fields of width zero do nothing in unions, however unnamed bit-fields of non zero width can change the size of the union since they have to fit in it.
Bit fields are limited compared to normal fields in that the address-of () and operators are not supported.
The following declares a structure type named and an instance of it named . The first field, flag, is a single bit flag, which can only be 1 or 0. The second field, num, is a signed 4-bit field; range -7...7 or -8...7. The last field adds 3 bits of padding to round out the structure to 8 bits.
<syntaxhighlight lang=C>
struct FlagStatus {
unsigned int flag : 1;
signed int num : 4;
signed int : 3;
} g;
</syntaxhighlight>
Namespaces
C itself has no native support for namespaces unlike C++ and Java. This makes C symbol names prone to name clashes. However, it is possible to use anonymous structs to emulate namespaces.
:
<syntaxhighlight lang="c">
- pragma once
const struct {
double PI;
double (*sin)(double);
} Math;
</syntaxhighlight>
:
<syntaxhighlight lang="c">
- include <math.h>
static double _sin(double arg) {
return sin(arg);
}
const struct {
double PI;
double (*sin)(double);
} Math = { M_PI, _sin };
</syntaxhighlight>
:
<syntaxhighlight lang="c">
- include <stdio.h>
- include "Math.h"
int main() {
printf("sin(0) = %d\n", Math.sin(0));
printf("pi is %f\n", Math.PI);
}
</syntaxhighlight>
Union
For the most part, a union is like a structure except that fields overlap in memory to allow storing values of different type although not at the same time. The union is like the variant record of other languages. Each field refers to the same location in memory. The size of a union is equal to the size of its largest component type plus any padding.
A union is declared with the keyword. The following declares a union named and an instance of it named :
<syntaxhighlight lang=C>
union MyUnion {
int x;
float y;
char* z;
} n;
</syntaxhighlight>
Initialization
Scalar
Initializing a variable along with declaring it involves appending an equals sign and then a construct that is compatible with the data type. The following initializes an int:
<syntaxhighlight lang=C>
int x = 12;
</syntaxhighlight>
Because of the language's grammar, a scalar initializer may be enclosed in any number of curly brace pairs. Most compilers issue a warning if there is more than one such pair. The following are legal although arguably unusual:
<syntaxhighlight lang=C>
int y = { 23 };
int z = { { 34 } };
</syntaxhighlight>
Initializer list
Structures, unions and arrays can be initialized after a declaration via an initializer list.
Since unmatched elements are set to 0, an empty list sets all elements to 0. For example, the following sets all elements of array a and all fields of s to 0:
<syntaxhighlight lang=C>
int a[10] = {};
struct MyStruct s = {};
</syntaxhighlight>
If an array is declared without an explicit size, the array is an incomplete type. The number of initializers determines the size of the array and completes the type. For example:
<syntaxhighlight lang=C>
int x[] = { 0, 1, 2 };
</syntaxhighlight>
By default, the items of an initializer list correspond with the elements in the order they are defined. Including too many values yields an error. The following statement initializes an instance of the structure MyStruct named pi:
<syntaxhighlight lang=C>
struct MyStruct {
int x;
float y;
char* z;
};
struct MyStruct pi = { 3, 3.1415, "Pi" };
</syntaxhighlight>
Designated initializers
Designated initializers allow members to be initialized by name, in any order, and without explicitly providing preceding values. The following initialization is functionally equivalent to the previous:
<syntaxhighlight lang=C>
struct MyStruct pi = { .z = "Pi", .x = 3, .y = 3.1415 };
</syntaxhighlight>
Using a designator in an initializer moves the initialization "cursor". In the example below, if <code>MAX</code> is greater than 10, there will be some zero-valued elements in the middle of <code>a</code>; if it is less than 10, some of the values provided by the first five initializers will be overridden by the second five. If <code>MAX</code> is less than 5, there will be a compilation error:
<syntaxhighlight lang=C>
int a[MAX] = { 1, 3, 5, 7, 9, [MAX - 5] = 8, 6, 4, 2, 0 };
</syntaxhighlight>
In C89, a union was initialized with a single value applied to its first member. That is, the union MyUnion defined above could only have its x member initialized:
<syntaxhighlight lang=C>
union MyUnion value = { 3 };
</syntaxhighlight>
Using a designated initializer, the member to be initialized does not have to be the first member:
<syntaxhighlight lang=C>
union MyUnion value = { .y = 3.1415 };
</syntaxhighlight>
Compound designators can be used to provide explicit initialization when unadorned initializer lists might be misunderstood. In the example below, <code>w</code> is declared as an array of structures, each structure consisting of a member <code>a</code> (an array of 3 <code>int</code>) and a member <code>b</code> (an <code>int</code>). The initializer sets the size of <code>w</code> to 2 and sets the values of the first element of each <code>a</code>:
<syntaxhighlight lang=C>
struct { int a[3], b; } w[] = { [0].a = {1}, [1].a[0] = 2 };
</syntaxhighlight><!--
Note: The C99 specification's grammar for designations reads in part:
designation: designator-list =
designator-list: designator
designator-list designator
designator: [ constant-expression ]
. identifier
...which can be understood as making these code fragments legal:
int x[] = { [1] [9] = 2 } ;
struct Names {
char* name;
char* nickname;
} who = { .name .nickname = "Unknown" };
But that is incorrect. The designators that make up the designator-list are *not* space-separated: the grammar means to describe the kind of structure-path given in the example.
-->
This is equivalent to:<syntaxhighlight lang=C>struct { int a[3], b; } w[] =
{
{ { 1, 0, 0 }, 0 },
{ { 2, 0, 0 }, 0 }
};</syntaxhighlight>
Compound literals
It is possible to borrow the initialization methodology to generate compound structure and array literals:
<syntaxhighlight lang=C>
// pointer created from array literal.
int* ptr = (int[]){ 10, 20, 30, 40 };
// pointer to array.
float (*foo)[3] = &(float[]){ 0.5f, 1.f, -0.5f };
struct MyStruct pi = (struct MyStruct){ 3, 3.1415, "Pi" };
</syntaxhighlight>
Compound literals are often combined with designated initializers to make the declaration more readable:
The following is an example of a <code>switch</code> over an <code>int</code>:
<syntaxhighlight lang="c">
- include <stdio.h>
// ...
int num = 2;
switch (num) {
case 1:
printf("Number is 1\n");
break
case 2:
printf("Number is 2\n");
break;
case 3:
printf("Number is 3\n");
break;
default:
printf("Number is not 1, 2, or 3\n");
}
</syntaxhighlight>
As of C2Y, it is possible use a "<code>case</code> range" between two integer constants, using an <code>...</code> (ellipsis). There must be a space between the value and the ellipsis. This range is inclusive. For example:
<syntaxhighlight lang="c">
- include <stdio.h>
// ...
int num = 2;
// new-style
switch (num) {
case 1 ... 3:
printf("Number is 1, 2, or 3\n");
break
default:
printf("Number is not 1, 2, or 3\n");
}
// old-style, using case fallthrough
switch (num) {
case 1:
case 2:
case 3:
printf("Number is 1, 2, or 3\n");
break
default:
printf("Number is not 1, 2, or 3\n");
}
</syntaxhighlight>
Iteration statement
There are three forms of iteration statement:
<!--DO NOT USE syntaxhighlight lang=C since is not valid C and does not support italics/bold-->
while (expression) {
statement
}
do {
statement
} while (expression)
for (init; test; next) {
statement
}
For the while loop| and statements, the sub-statement is executed repeatedly so long as the value of the expression is non-zero. For , the test, including any side effects, occurs before each iteration. For , the test occurs after each iteration. Thus, a statement always executes its sub-statement at least once, whereas might not execute the sub-statement at all.
The logic of can be described in terms of in that this:
<syntaxhighlight lang=C>
for (e1; e2; e3) {
s;
}
</syntaxhighlight>
is equivalent to:
<syntaxhighlight lang=C>
e1;
while (e2) {
s;
cont:
e3;
}
</syntaxhighlight>
except for the behavior of a statement (which in the loop jumps to instead of ). If is blank, it would have to be replaced with a .
Any of the three expressions in the loop may be omitted. A missing second expression makes the test always non-zero; describing an infinite loop.
Since C99, the first expression may take the form of a declaration with scope limited to the sub-statement For example:
<syntaxhighlight lang=C>
for (int i = 0; i < limit; ++i) {
// ...
}
</syntaxhighlight>
The foreach loop does not exist in C, like it does in Java and C++. However, it can be emulated using macros.
Jump statement
There are four jump statements (transfer control unconditionally): GOTO|, , , and return statement|.
The statement passes program control to a labeled statement. It has the following syntax:
<!--DO NOT USE syntaxhighlight lang=C since is not valid C and does not support italics/bold-->
goto label-name
A statement which is simply the word , transfers control to the loop-continuation point of the innermost, enclosing iteration statement. It must be enclosed within an iteration statement. For example:
<syntaxhighlight lang=C>
while (true) {
// ...
continue;
}
do {
// ...
continue;
} while (true);
for (;;) {
// ...
continue;
}
</syntaxhighlight>
The statement which is simply the word ends a , , , or statement. Control passes to the statement following the enclosing control statement.
The statement transfers control the function caller. When is followed by an expression, the value is returned to the caller. Encountering the end of the function is equivalent to a with no expression. In that case, if the function is declared as returning a value and the caller tries to use the returned value, the behavior is undefined.
Labels
A label marks a point in the code to which control can be transferred. A label is an identifier followed by a colon. For example:
<syntaxhighlight lang=C>
if (i == 1) {
goto END;
}
// other code
END:
</syntaxhighlight>
The standard does not define a method to retrieve the address of a label, but
GCC extends the language with a unary operator that returns the address of a label. The address can be stored in a variable and may be used later with a . This feature can be used to implement a jump table.
For example, the following prints repeatedly:
<syntaxhighlight lang=C>
void* ptr = &&J1;
J1: printf("hi");
goto *ptr;
</syntaxhighlight>
Since C2Y, C has labelled loops, similar to Java. This allows for attaching labels to <code>for</code>, and transferring control through <code>break</code> and <code>continue</code> with labels (multi-level breaks).
<syntaxhighlight lang="c">
// using break:
outer:
for (int i = 0; i < n; ++i) {
switch (i) {
case 1:
break; // jumps to 1
case 2:
break outer; // jumps to 2
default:
continue;
}
// 1
}
// 2
// using continue:
outer:
for (int i = 0; i < m; ++i) {
for (int j = 0; j < n; ++j) {
continue; // jumps to 1
continue outer; // jumps to 2
// 1
}
// 2
}
</syntaxhighlight>
Functions
Definition
For a function that returns a value, a definition consists of a return type name, a function name that is unique in the codebase, a list of parameters in parentheses, and a statement block that ends with a statement. The block can contain a statement to exit the function before the end of the block. The syntax is like:
<!--DO NOT USE syntaxhighlight lang=C since is not valid C and does not support italics/bold-->
type-name function-name(parameter-list)
{
statement-list
return value;
}
A function that returns no value is declared with instead of a type name, like:
<!--DO NOT USE syntaxhighlight lang=C since is not valid C and does not support italics/bold-->
void function-name(parameter-list)
{
statement-list
}
The standard does not include lambda functions, but some translators do.
Parameters
A parameter-list is a comma-separated list of formal parameter declarations; each item a type name followed by a variable name:
<!--DO NOT USE syntaxhighlight lang=C since is not valid C-->
type-name variable-name{, type-name variable-name}
The return type cannot be an array or a function. For example:
<syntaxhighlight lang=C>
int f()[3]; // Error: function returning an array
int (*g())[3]; // OK: function returning a pointer to an array
void h()(); // Error: function returning a function
void (*k())(); // OK: function returning a function pointer
</syntaxhighlight>
If the function accepts no parameters, the parameter-list may be the keyword or blank, but these have different implications. Calling a function with arguments when it is declared with for the parameter-list is invalid syntax. Calling a function with arguments when it is declared with a blank parameter-list is not invalid syntax, but may result in undefined behavior. Using , is therefore, best practice.
A function can accept a variable number of arguments by including at the end of the argument list. A commonly used function with this declaration is the standard library function which has prototype:
<syntaxhighlight lang=C>
int printf(const char*, ...);
</syntaxhighlight>
Consuming variable length arguments can be accomplished via standard library functions declared in stdarg.h|.
Calling
Code can access a function of a library if it is both declared and defined. Often a declaration is provided for a library function via a header file that the consuming code uses via the directive. Alternatively, the consuming code can declare the function in its own file. The function definition is associated with the consuming code at link-time. The standard library is generally linked by default whereas other libraries require link-time configuration.
Accessing a user-defined function that is defined in a different file is similar to using a library function. The consuming code declares the function either by including a header file or directly in its file. Linking to the definition in the other file is handled when the object files are linked.
Calling a function that is defined in the same file is relatively simple. The definition or a declaration of it must be above the call.
Argument passing
An argument is passed to a function by value which means that a called function receives a copy of the argument and cannot alter the argument variable. For a function to alter the value of a variable, the caller passes the variable's address (a pointer) which simulates what other languages provide as by reference. The called function can modify the variable by dereferencing the passed address.
In the following code, the address of is passed by specifing in the call. The called function receives the address as and accesses as .
<syntaxhighlight lang=C line>
void incInt(int* y) {
(*y)++;
}
int main(void) {
int x = 7;
incInt(&x);
return 0;
}
</syntaxhighlight>
The following code demonstrates a more advanced use of pointers passing a pointer to a pointer. An int pointer named is defined on line 9 and its address is passed to the function on line 10. The function receives a pointer to pointer to int named . It assigns (as ). After the call, on line 11, the memory allocated and assigned to address is freed.
<syntaxhighlight lang=C line>
- include <stdio.h>
- include <stdlib.h>
void allocate_array(int** const a_p, const int count) {
- a_p = malloc(sizeof(int) * count);
}
int main(void) {
int* a;
allocate_array(&a, 42);
free(a);
return 0;
}
</syntaxhighlight>
Array passing
Function parameters of array type may at first glance appear to be an exception to the pass-by-value rule as demonstrated by the following program that prints 123, not 1:
<syntaxhighlight lang=C>
- include <stdio.h>
void setArray(int array[], int index) {
array[index] = 123;
}
int main(void) {
int a[1] = {1};
setArray(a, 0);
printf("a[0]=%d\n", a[0]);
return 0;
}
</syntaxhighlight>
However, there is a different reason for this behavior. An array parameter is treated as a pointer. The following prototype is equivalent to the function prototype above:
<syntaxhighlight lang=C>
void setArray(int* array, int index);
</syntaxhighlight>
At the same time, rules for the use of arrays in expressions cause the value of to be treated as a pointer to the first element. Thus, this is still pass-by-value, with the caveat that it is the address of the first element of the array being passed by value, not the contents of the array.
Since C99, the programmer can specify that a function takes an array of a certain size by using the keyword . In the first parameter must be a pointer to the first element of an array of length at least 4. It is also possible to use qualifiers (<code>const</code>, <code>volatile</code> and <code>restrict</code>) to the pointer type that the array is converted to.
Attributes
Added in C23 and originating from C++11, C supports attribute specifier sequences. Attributes can be applied to any symbol that supports them, including functions and variables, and any symbol marked with an attribute will be specifically treated by the compiler as necessary. These can be thought of as similar to Java annotations for providing additional information to the compiler, however they differ in that attributes in C are not metadata that is meant to be accessed using reflection. Furthermore, one cannot create custom attributes in C, unlike in Java where one may define custom annotations in addition to the standard ones. However, C does have implementation/vendor-specific attributes which are non-standard. These typically have a namespace associated with them. For instance, GCC and Clang have attributes under the <code>gnu::</code> namespace, and all such attributes are of the form , though C does not have support for namespacing in the language.
The syntax of using an attribute on a function is like so:
<syntaxhighlight lang="c">
nodiscard
bool satisfiesProperty(const struct MyStruct* s);
</syntaxhighlight>
The standard defines the following attributes:
{| class="wikitable"
! Name !! Description
|-
|
|| Indicates that the specified function will not return to its caller.
|-
| <br>
|| Indicates that the use of the marked symbol is allowed but discouraged/deprecated for the reason specified (if given).
|-
|
|| Indicates that the fall through from the previous case label is intentional.
|-
|
|| Suppresses compiler warnings on an unused entity.
|-
| <br>
|| Issues a compiler warning if the return value of the marked symbol is discarded or ignored for the reason specified (if given).
|-
|
|| Indicates that a function is stateless, effectless, idempotent and independent.
|-
|
|| Indicates that a function is effectless and idempotent.
|}
Dynamic memory
See also
- Blocks (C language extension)
- C data types
- C standard library
- C++ syntax
- C# syntax
- Java syntax
- Rust syntax
- List of C-family programming languages
- Operators in C and C++
Notes
References
;General
- American National Standard for Information Systems - Programming Language - C - ANSI X3.159-1989
External links
- The syntax of C in Backus–Naur form
- Programming in C
- The comp.lang.c Frequently Asked Questions Page
- C reference
