summaryrefslogtreecommitdiffstats
path: root/contrib/gcc/doc/cpp.texi
diff options
context:
space:
mode:
Diffstat (limited to 'contrib/gcc/doc/cpp.texi')
-rw-r--r--contrib/gcc/doc/cpp.texi555
1 files changed, 311 insertions, 244 deletions
diff --git a/contrib/gcc/doc/cpp.texi b/contrib/gcc/doc/cpp.texi
index 070e31c..3697887 100644
--- a/contrib/gcc/doc/cpp.texi
+++ b/contrib/gcc/doc/cpp.texi
@@ -9,7 +9,7 @@
@copying
@c man begin COPYRIGHT
Copyright @copyright{} 1987, 1989, 1991, 1992, 1993, 1994, 1995, 1996,
-1997, 1998, 1999, 2000, 2001, 2002, 2003
+1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
@@ -104,6 +104,7 @@ useful on its own.
Overview
+* Character sets::
* Initial processing::
* Tokenization::
* The preprocessing language::
@@ -233,11 +234,62 @@ manual refer to GNU CPP.
@c man end
@menu
+* Character sets::
* Initial processing::
* Tokenization::
* The preprocessing language::
@end menu
+@node Character sets
+@section Character sets
+
+Source code character set processing in C and related languages is
+rather complicated. The C standard discusses two character sets, but
+there are really at least four.
+
+The files input to CPP might be in any character set at all. CPP's
+very first action, before it even looks for line boundaries, is to
+convert the file into the character set it uses for internal
+processing. That set is what the C standard calls the @dfn{source}
+character set. It must be isomorphic with ISO 10646, also known as
+Unicode. CPP uses the UTF-8 encoding of Unicode.
+
+At present, GNU CPP does not implement conversion from arbitrary file
+encodings to the source character set. Use of any encoding other than
+plain ASCII or UTF-8, except in comments, will cause errors. Use of
+encodings that are not strict supersets of ASCII, such as Shift JIS,
+may cause errors even if non-ASCII characters appear only in comments.
+We plan to fix this in the near future.
+
+All preprocessing work (the subject of the rest of this manual) is
+carried out in the source character set. If you request textual
+output from the preprocessor with the @option{-E} option, it will be
+in UTF-8.
+
+After preprocessing is complete, string and character constants are
+converted again, into the @dfn{execution} character set. This
+character set is under control of the user; the default is UTF-8,
+matching the source character set. Wide string and character
+constants have their own character set, which is not called out
+specifically in the standard. Again, it is under control of the user.
+The default is UTF-16 or UTF-32, whichever fits in the target's
+@code{wchar_t} type, in the target machine's byte
+order.@footnote{UTF-16 does not meet the requirements of the C
+standard for a wide character set, but the choice of 16-bit
+@code{wchar_t} is enshrined in some system ABIs so we cannot fix
+this.} Octal and hexadecimal escape sequences do not undergo
+conversion; @t{'\x12'} has the value 0x12 regardless of the currently
+selected execution character set. All other escapes are replaced by
+the character in the source character set that they represent, then
+converted to the execution character set, just like unescaped
+characters.
+
+GCC does not permit the use of characters outside the ASCII range, nor
+@samp{\u} and @samp{\U} escapes, in identifiers. We hope this will
+change eventually, but there are problems with the standard semantics
+of such ``extended identifiers'' which must be resolved through the
+ISO C and C++ committees first.
+
@node Initial processing
@section Initial processing
@@ -251,28 +303,19 @@ standard.
@enumerate
@item
-@cindex character sets
@cindex line endings
The input file is read into memory and broken into lines.
-CPP expects its input to be a text file, that is, an unstructured
-stream of ASCII characters, with some characters indicating the end of a
-line of text. Extended ASCII character sets, such as ISO Latin-1 or
-Unicode encoded in UTF-8, are also acceptable. Character sets that are
-not strict supersets of seven-bit ASCII will not work. We plan to add
-complete support for international character sets in a future release.
-
Different systems use different conventions to indicate the end of a
line. GCC accepts the ASCII control sequences @kbd{LF}, @kbd{@w{CR
-LF}}, @kbd{CR}, and @kbd{@w{LF CR}} as end-of-line markers. The first
-three are the canonical sequences used by Unix, DOS and VMS, and the
-classic Mac OS (before OSX) respectively. You may therefore safely copy
-source code written on any of those systems to a different one and use
-it without conversion. (GCC may lose track of the current line number
-if a file doesn't consistently use one convention, as sometimes happens
-when it is edited on computers with different conventions that share a
-network file system.) @kbd{@w{LF CR}} is included because it has been
-reported as an end-of-line marker under exotic conditions.
+LF}} and @kbd{CR} as end-of-line markers. These are the canonical
+sequences used by Unix, DOS and VMS, and the classic Mac OS (before
+OSX) respectively. You may therefore safely copy source code written
+on any of those systems to a different one and use it without
+conversion. (GCC may lose track of the current line number if a file
+doesn't consistently use one convention, as sometimes happens when it
+is edited on computers with different conventions that share a network
+file system.)
If the last line of any input file lacks an end-of-line marker, the end
of the file is considered to implicitly supply one. The C standard says
@@ -293,23 +336,25 @@ obsolete systems that lack some of C's punctuation to use C@. For
example, @samp{??/} stands for @samp{\}, so @t{'??/n'} is a character
constant for a newline.
-Trigraphs are not popular and many compilers implement them incorrectly.
-Portable code should not rely on trigraphs being either converted or
-ignored. If you use the @option{-Wall} or @option{-Wtrigraphs} options,
-GCC will warn you when a trigraph would change the meaning of your
-program if it were converted.
+Trigraphs are not popular and many compilers implement them
+incorrectly. Portable code should not rely on trigraphs being either
+converted or ignored. With @option{-Wtrigraphs} GCC will warn you
+when a trigraph may change the meaning of your program if it were
+converted. @xref{Wtrigraphs}.
-In a string constant, you can prevent a sequence of question marks from
-being confused with a trigraph by inserting a backslash between the
-question marks. @t{"(??\?)"} is the string @samp{(???)}, not
-@samp{(?]}. Traditional C compilers do not recognize this idiom.
+In a string constant, you can prevent a sequence of question marks
+from being confused with a trigraph by inserting a backslash between
+the question marks, or by separating the string literal at the
+trigraph and making use of string literal concatenation. @t{"(??\?)"}
+is the string @samp{(???)}, not @samp{(?]}. Traditional C compilers
+do not recognize these idioms.
The nine trigraphs and their replacements are
-@example
+@smallexample
Trigraph: ??( ??) ??< ??> ??= ??/ ??' ??! ??-
Replacement: [ ] @{ @} # \ ^ | ~
-@end example
+@end smallexample
@item
@cindex continued lines
@@ -340,23 +385,23 @@ There are two kinds of comments. @dfn{Block comments} begin with
@samp{/*} and continue until the next @samp{*/}. Block comments do not
nest:
-@example
+@smallexample
/* @r{this is} /* @r{one comment} */ @r{text outside comment}
-@end example
+@end smallexample
@dfn{Line comments} begin with @samp{//} and continue to the end of the
current line. Line comments do not nest either, but it does not matter,
because they would end in the same place anyway.
-@example
+@smallexample
// @r{this is} // @r{one comment}
@r{text outside comment}
-@end example
+@end smallexample
@end enumerate
It is safe to put line comments inside block comments, or vice versa.
-@example
+@smallexample
@group
/* @r{block comment}
// @r{contains line comment}
@@ -365,19 +410,19 @@ It is safe to put line comments inside block comments, or vice versa.
// @r{line comment} /* @r{contains block comment} */
@end group
-@end example
+@end smallexample
But beware of commenting out one end of a block comment with a line
comment.
-@example
+@smallexample
@group
// @r{l.c.} /* @r{block comment begins}
@r{oops! this isn't a comment anymore} */
@end group
-@end example
+@end smallexample
-Comments are not recognized within string literals.
+Comments are not recognized within string literals.
@t{@w{"/* blah */"}} is the string constant @samp{@w{/* blah */}}, not
an empty string.
@@ -392,7 +437,7 @@ next line with backslash-newline. You can even split @samp{/*},
@samp{*/}, and @samp{//} onto multiple lines with backslash-newline.
For example:
-@example
+@smallexample
@group
/\
*
@@ -402,7 +447,7 @@ ne FO\
O 10\
20
@end group
-@end example
+@end smallexample
@noindent
is equivalent to @code{@w{#define FOO 1020}}. All these tricks are
@@ -437,7 +482,7 @@ Once the input file is broken into tokens, the token boundaries never
change, except when the @samp{##} preprocessing operator is used to paste
tokens together. @xref{Concatenation}. For example,
-@example
+@smallexample
@group
#define foo() bar
foo()baz
@@ -445,7 +490,7 @@ foo()baz
@emph{not}
@expansion{} barbaz
@end group
-@end example
+@end smallexample
The compiler does not re-tokenize the preprocessor's output. Each
preprocessing token becomes one compiler token.
@@ -545,10 +590,10 @@ punctuation in obsolete systems. It has no negative side effects,
unlike trigraphs, but does not cover as much ground. The digraphs and
their corresponding normal punctuators are:
-@example
+@smallexample
Digraph: <% %> <: :> %: %:%:
Punctuator: @{ @} [ ] # ##
-@end example
+@end smallexample
@cindex other tokens
Any other single character is considered ``other.'' It is passed on to
@@ -568,10 +613,10 @@ silently ignored, just as any other character would be. In running
text, NUL is considered white space. For example, these two directives
have the same meaning.
-@example
+@smallexample
#define X^@@1
#define X 1
-@end example
+@end smallexample
@noindent
(where @samp{^@@} is ASCII NUL)@. Within string or character constants,
@@ -746,15 +791,15 @@ file, followed by the output that comes from the text after the
@samp{#include} directive. For example, if you have a header file
@file{header.h} as follows,
-@example
+@smallexample
char *test (void);
-@end example
+@end smallexample
@noindent
and a main program called @file{program.c} that uses the header file,
like this,
-@example
+@smallexample
int x;
#include "header.h"
@@ -763,13 +808,13 @@ main (void)
@{
puts (test ());
@}
-@end example
+@end smallexample
@noindent
the compiler will see the same token stream as it would if
@file{program.c} read
-@example
+@smallexample
int x;
char *test (void);
@@ -778,7 +823,7 @@ main (void)
@{
puts (test ());
@}
-@end example
+@end smallexample
Included files are not limited to declarations and macro definitions;
those are merely the typical uses. Any fragment of a C program can be
@@ -805,12 +850,12 @@ GCC looks in several different places for headers. On a normal Unix
system, if you do not instruct it otherwise, it will look for headers
requested with @code{@w{#include <@var{file}>}} in:
-@example
+@smallexample
/usr/local/include
-/usr/lib/gcc-lib/@var{target}/@var{version}/include
+@var{libdir}/gcc/@var{target}/@var{version}/include
/usr/@var{target}/include
/usr/include
-@end example
+@end smallexample
For C++ programs, it will also look in @file{/usr/include/g++-v3},
first. In the above, @var{target} is the canonical name of the system
@@ -881,7 +926,7 @@ it will certainly waste time.
The standard way to prevent this is to enclose the entire real contents
of the file in a conditional, like this:
-@example
+@smallexample
@group
/* File foo. */
#ifndef FILE_FOO_SEEN
@@ -891,7 +936,7 @@ of the file in a conditional, like this:
#endif /* !FILE_FOO_SEEN */
@end group
-@end example
+@end smallexample
This construct is commonly known as a @dfn{wrapper #ifndef}.
When the header is included again, the conditional will be false,
@@ -926,7 +971,7 @@ files to be included into your program. They might specify
configuration parameters to be used on different sorts of operating
systems, for instance. You could do this with a series of conditionals,
-@example
+@smallexample
#if SYSTEM_1
# include "system_1.h"
#elif SYSTEM_2
@@ -934,18 +979,18 @@ systems, for instance. You could do this with a series of conditionals,
#elif SYSTEM_3
@dots{}
#endif
-@end example
+@end smallexample
That rapidly becomes tedious. Instead, the preprocessor offers the
ability to use a macro for the header name. This is called a
@dfn{computed include}. Instead of writing a header name as the direct
argument of @samp{#include}, you simply put a macro name there instead:
-@example
+@smallexample
#define SYSTEM_H "system_1.h"
@dots{}
#include SYSTEM_H
-@end example
+@end smallexample
@noindent
@code{SYSTEM_H} will be expanded, and the preprocessor will look for
@@ -970,10 +1015,10 @@ string constant are the file to be included. CPP does not re-examine the
string for embedded quotes, but neither does it process backslash
escapes in the string. Therefore
-@example
+@smallexample
#define HEADER "a\"b"
#include HEADER
-@end example
+@end smallexample
@noindent
looks for a file named @file{a\"b}. CPP searches for the file according
@@ -1018,9 +1063,9 @@ header is not protected from multiple inclusion (@pxref{Once-Only
Headers}), it will recurse infinitely and cause a fatal error.
You could include the old header with an absolute pathname:
-@example
+@smallexample
#include "/usr/include/old-header.h"
-@end example
+@end smallexample
@noindent
This works, but is not clean; should the system headers ever move, you
would have to edit the new headers to match.
@@ -1139,29 +1184,29 @@ followed by the name of the macro and then the token sequence it should
be an abbreviation for, which is variously referred to as the macro's
@dfn{body}, @dfn{expansion} or @dfn{replacement list}. For example,
-@example
+@smallexample
#define BUFFER_SIZE 1024
-@end example
+@end smallexample
@noindent
defines a macro named @code{BUFFER_SIZE} as an abbreviation for the
token @code{1024}. If somewhere after this @samp{#define} directive
there comes a C statement of the form
-@example
+@smallexample
foo = (char *) malloc (BUFFER_SIZE);
-@end example
+@end smallexample
@noindent
then the C preprocessor will recognize and @dfn{expand} the macro
@code{BUFFER_SIZE}. The C compiler will see the same tokens as it would
if you had written
-@example
+@smallexample
foo = (char *) malloc (1024);
-@end example
+@end smallexample
-By convention, macro names are written in upper case. Programs are
+By convention, macro names are written in uppercase. Programs are
easier to read when it is possible to tell at a glance which names are
macros.
@@ -1170,13 +1215,13 @@ continue the definition onto multiple lines, if necessary, using
backslash-newline. When the macro is expanded, however, it will all
come out on one line. For example,
-@example
+@smallexample
#define NUMBERS 1, \
2, \
3
int x[] = @{ NUMBERS @};
@expansion{} int x[] = @{ 1, 2, 3 @};
-@end example
+@end smallexample
@noindent
The most common visible consequence of this is surprising line numbers
@@ -1191,25 +1236,25 @@ The C preprocessor scans your program sequentially. Macro definitions
take effect at the place you write them. Therefore, the following input
to the C preprocessor
-@example
+@smallexample
foo = X;
#define X 4
bar = X;
-@end example
+@end smallexample
@noindent
produces
-@example
+@smallexample
foo = X;
bar = 4;
-@end example
+@end smallexample
When the preprocessor expands a macro name, the macro's expansion
replaces the macro invocation, then the expansion is examined for more
macros to expand. For example,
-@example
+@smallexample
@group
#define TABLESIZE BUFSIZE
#define BUFSIZE 1024
@@ -1217,7 +1262,7 @@ TABLESIZE
@expansion{} BUFSIZE
@expansion{} 1024
@end group
-@end example
+@end smallexample
@noindent
@code{TABLESIZE} is expanded first to produce @code{BUFSIZE}, then that
@@ -1235,12 +1280,12 @@ at some point in the source file. @code{TABLESIZE}, defined as shown,
will always expand using the definition of @code{BUFSIZE} that is
currently in effect:
-@example
+@smallexample
#define BUFSIZE 1020
#define TABLESIZE BUFSIZE
#undef BUFSIZE
#define BUFSIZE 37
-@end example
+@end smallexample
@noindent
Now @code{TABLESIZE} expands (in two stages) to @code{37}.
@@ -1259,24 +1304,24 @@ are called @dfn{function-like macros}. To define a function-like macro,
you use the same @samp{#define} directive, but you put a pair of
parentheses immediately after the macro name. For example,
-@example
+@smallexample
#define lang_init() c_init()
lang_init()
@expansion{} c_init()
-@end example
+@end smallexample
A function-like macro is only expanded if its name appears with a pair
of parentheses after it. If you write just the name, it is left alone.
This can be useful when you have a function and a macro of the same
name, and you wish to use the function sometimes.
-@example
+@smallexample
extern void foo(void);
#define foo() /* optimized inline version */
@dots{}
foo();
funcptr = foo;
-@end example
+@end smallexample
Here the call to @code{foo()} will use the macro, but the function
pointer will get the address of the real function. If the macro were to
@@ -1287,11 +1332,11 @@ macro definition, that does not define a function-like macro, it defines
an object-like macro whose expansion happens to begin with a pair of
parentheses.
-@example
+@smallexample
#define lang_init () c_init()
lang_init()
@expansion{} () c_init()()
-@end example
+@end smallexample
The first two pairs of parentheses in this expansion come from the
macro. The third is the pair that was originally after the macro
@@ -1323,12 +1368,12 @@ macro body.)
As an example, here is a macro that computes the minimum of two numeric
values, as it is defined in many C programs, and some uses.
-@example
+@smallexample
#define min(X, Y) ((X) < (Y) ? (X) : (Y))
x = min(a, b); @expansion{} x = ((a) < (b) ? (a) : (b));
y = min(1, 2); @expansion{} y = ((1) < (2) ? (1) : (2));
z = min(a + 28, *p); @expansion{} z = ((a + 28) < (*p) ? (a + 28) : (*p));
-@end example
+@end smallexample
@noindent
(In this small example you can already see several of the dangers of
@@ -1341,9 +1386,9 @@ such parentheses does not end the argument. However, there is no
requirement for square brackets or braces to balance, and they do not
prevent a comma from separating arguments. Thus,
-@example
+@smallexample
macro (array[x = y, x + 1])
-@end example
+@end smallexample
@noindent
passes two arguments to @code{macro}: @code{array[x = y} and @code{x +
@@ -1361,20 +1406,20 @@ Prescan}, for detailed discussion.
For example, @code{min (min (a, b), c)} is first expanded to
-@example
+@smallexample
min (((a) < (b) ? (a) : (b)), (c))
-@end example
+@end smallexample
@noindent
and then to
-@example
+@smallexample
@group
((((a) < (b) ? (a) : (b))) < (c)
? (((a) < (b) ? (a) : (b)))
: (c))
@end group
-@end example
+@end smallexample
@noindent
(Line breaks shown here for clarity would not actually be generated.)
@@ -1386,7 +1431,7 @@ You cannot leave out arguments entirely; if a macro takes two arguments,
there must be exactly one comma at the top level of its argument list.
Here are some silly examples using @code{min}:
-@example
+@smallexample
min(, b) @expansion{} (( ) < (b) ? ( ) : (b))
min(a, ) @expansion{} ((a ) < ( ) ? (a ) : ( ))
min(,) @expansion{} (( ) < ( ) ? ( ) : ( ))
@@ -1394,7 +1439,7 @@ min((,),) @expansion{} (((,)) < ( ) ? ((,)) : ( ))
min() @error{} macro "min" requires 2 arguments, but only 1 given
min(,,) @error{} macro "min" passed 3 arguments, but takes just 2
-@end example
+@end smallexample
Whitespace is not a preprocessing token, so if a macro @code{foo} takes
one argument, @code{@w{foo ()}} and @code{@w{foo ( )}} both supply it an
@@ -1406,10 +1451,10 @@ empty argument was required.
Macro parameters appearing inside string literals are not replaced by
their corresponding actual arguments.
-@example
+@smallexample
#define foo(x) x, "x"
foo(bar) @expansion{} bar, "x"
-@end example
+@end smallexample
@node Stringification
@section Stringification
@@ -1433,7 +1478,7 @@ long string.
Here is an example of a macro definition that uses stringification:
-@example
+@smallexample
@group
#define WARN_IF(EXP) \
do @{ if (EXP) \
@@ -1443,7 +1488,7 @@ WARN_IF (x == 0);
@expansion{} do @{ if (x == 0)
fprintf (stderr, "Warning: " "x == 0" "\n"); @} while (0);
@end group
-@end example
+@end smallexample
@noindent
The argument for @code{EXP} is substituted once, as-is, into the
@@ -1476,7 +1521,7 @@ There is no way to convert a macro argument into a character constant.
If you want to stringify the result of expansion of a macro argument,
you have to use two levels of macros.
-@example
+@smallexample
#define xstr(s) str(s)
#define str(s) #s
#define foo 4
@@ -1486,7 +1531,7 @@ xstr (foo)
@expansion{} xstr (4)
@expansion{} str (4)
@expansion{} "4"
-@end example
+@end smallexample
@code{s} is stringified when it is used in @code{str}, so it is not
macro-expanded first. But @code{s} is an ordinary argument to
@@ -1543,7 +1588,7 @@ Consider a C program that interprets named commands. There probably
needs to be a table of commands, perhaps an array of structures declared
as follows:
-@example
+@smallexample
@group
struct command
@{
@@ -1560,7 +1605,7 @@ struct command commands[] =
@dots{}
@};
@end group
-@end example
+@end smallexample
It would be cleaner not to have to give each command name twice, once in
the string constant and once in the function name. A macro which takes the
@@ -1568,7 +1613,7 @@ name of a command as an argument can make this unnecessary. The string
constant can be created with stringification, and the function name by
concatenating the argument with @samp{_command}. Here is how it is done:
-@example
+@smallexample
#define COMMAND(NAME) @{ #NAME, NAME ## _command @}
struct command commands[] =
@@ -1577,7 +1622,7 @@ struct command commands[] =
COMMAND (help),
@dots{}
@};
-@end example
+@end smallexample
@node Variadic Macros
@section Variadic Macros
@@ -1589,9 +1634,9 @@ A macro can be declared to accept a variable number of arguments much as
a function can. The syntax for defining the macro is similar to that of
a function. Here is an example:
-@example
+@smallexample
#define eprintf(@dots{}) fprintf (stderr, __VA_ARGS__)
-@end example
+@end smallexample
This kind of macro is called @dfn{variadic}. When the macro is invoked,
all the tokens in its argument list after the last named argument (this
@@ -1600,10 +1645,10 @@ argument}. This sequence of tokens replaces the identifier
@code{@w{__VA_ARGS__}} in the macro body wherever it appears. Thus, we
have this expansion:
-@example
+@smallexample
eprintf ("%s:%d: ", input_file, lineno)
@expansion{} fprintf (stderr, "%s:%d: ", input_file, lineno)
-@end example
+@end smallexample
The variable argument is completely macro-expanded before it is inserted
into the macro expansion, just like an ordinary argument. You may use
@@ -1617,9 +1662,9 @@ this, as an extension. You may write an argument name immediately
before the @samp{@dots{}}; that name is used for the variable argument.
The @code{eprintf} macro above could be written
-@example
+@smallexample
#define eprintf(args@dots{}) fprintf (stderr, args)
-@end example
+@end smallexample
@noindent
using this extension. You cannot use @code{@w{__VA_ARGS__}} and this
@@ -1628,9 +1673,9 @@ extension in the same macro.
You can have named arguments as well as variable arguments in a variadic
macro. We could define @code{eprintf} like this, instead:
-@example
+@smallexample
#define eprintf(format, @dots{}) fprintf (stderr, format, __VA_ARGS__)
-@end example
+@end smallexample
@noindent
This formulation looks more descriptive, but unfortunately it is less
@@ -1640,26 +1685,26 @@ argument from the variable arguments. Furthermore, if you leave the
variable argument empty, you will get a syntax error, because
there will be an extra comma after the format string.
-@example
+@smallexample
eprintf("success!\n", );
@expansion{} fprintf(stderr, "success!\n", );
-@end example
+@end smallexample
GNU CPP has a pair of extensions which deal with this problem. First,
you are allowed to leave the variable argument out entirely:
-@example
+@smallexample
eprintf ("success!\n")
@expansion{} fprintf(stderr, "success!\n", );
-@end example
+@end smallexample
@noindent
Second, the @samp{##} token paste operator has a special meaning when
placed between a comma and a variable argument. If you write
-@example
+@smallexample
#define eprintf(format, @dots{}) fprintf (stderr, format, ##__VA_ARGS__)
-@end example
+@end smallexample
@noindent
and the variable argument is left out when the @code{eprintf} macro is
@@ -1667,10 +1712,10 @@ used, then the comma before the @samp{##} will be deleted. This does
@emph{not} happen if you pass an empty argument, nor does it happen if
the token preceding @samp{##} is anything other than a comma.
-@example
+@smallexample
eprintf ("success!\n")
@expansion{} fprintf(stderr, "success!\n");
-@end example
+@end smallexample
@noindent
The above explanation is ambiguous about the case where the only macro
@@ -1703,9 +1748,9 @@ previous versions of GCC, the token preceding the special @samp{##} must
be a comma, and there must be white space between that comma and
whatever comes immediately before it:
-@example
+@smallexample
#define eprintf(format, args@dots{}) fprintf (stderr, format , ##args)
-@end example
+@end smallexample
@noindent
@xref{Differences from previous versions}, for the gory details.
@@ -1758,12 +1803,12 @@ message to report an inconsistency detected by the program; the message
can state the source line at which the inconsistency was detected. For
example,
-@example
+@smallexample
fprintf (stderr, "Internal error: "
"negative string length "
"%d at %s, line %d.",
length, __FILE__, __LINE__);
-@end example
+@end smallexample
An @samp{#include} directive changes the expansions of @code{__FILE__}
and @code{__LINE__} to correspond to the included file. At the end of
@@ -1849,7 +1894,8 @@ or a C++ compiler. This macro is similar to @code{__STDC_VERSION__}, in
that it expands to a version number. A fully conforming implementation
of the 1998 C++ standard will define this macro to @code{199711L}. The
GNU C++ compiler is not yet fully conforming, so it uses @code{1}
-instead. We hope to complete our implementation in the near future.
+instead. It is hoped to complete the implementation of standard C++
+in the near future.
@item __OBJC__
This macro is defined, with value 1, when the Objective-C compiler is in
@@ -1897,26 +1943,26 @@ minor version and patch level are reset. If you wish to use the
predefined macros directly in the conditional, you will need to write it
like this:
-@example
+@smallexample
/* @r{Test for GCC > 3.2.0} */
#if __GNUC__ > 3 || \
(__GNUC__ == 3 && (__GNUC_MINOR__ > 2 || \
(__GNUC_MINOR__ == 2 && \
__GNUC_PATCHLEVEL__ > 0))
-@end example
+@end smallexample
@noindent
Another approach is to use the predefined macros to
calculate a single number, then compare that against a threshold:
-@example
+@smallexample
#define GCC_VERSION (__GNUC__ * 10000 \
+ __GNUC_MINOR__ * 100 \
+ __GNUC_PATCHLEVEL__)
@dots{}
/* @r{Test for GCC > 3.2.0} */
#if GCC_VERSION > 30200
-@end example
+@end smallexample
@noindent
Many people find this form easier to understand.
@@ -2022,7 +2068,7 @@ this macro directly; instead, include the appropriate headers.
@itemx __INT_MAX__
@itemx __LONG_MAX__
@itemx __LONG_LONG_MAX__
-Defined to the maximum value of the @code{signed char}, @code{wchar_t},
+Defined to the maximum value of the @code{signed char}, @code{wchar_t},
@code{signed short},
@code{signed int}, @code{signed long}, and @code{signed long long} types
respectively. They exist to make the standard header given numerical limits
@@ -2041,7 +2087,7 @@ runtime is used, this macro is not defined, so that you can use this
macro to determine which runtime (NeXT or GNU) is being used.
@item __LP64__
-@item _LP64
+@itemx _LP64
These macros are defined, with value 1, if (and only if) the compilation
is for a target where @code{long int} and pointer both use 64-bits and
@code{int} uses 32-bit.
@@ -2131,12 +2177,12 @@ macro is function-like. It is an error if anything appears on the line
after the macro name. @samp{#undef} has no effect if the name is not a
macro.
-@example
+@smallexample
#define FOO 4
x = FOO; @expansion{} x = 4;
#undef FOO
x = FOO; @expansion{} x = FOO;
-@end example
+@end smallexample
Once a macro has been undefined, that identifier may be @dfn{redefined}
as a macro by a subsequent @samp{#define} directive. The new definition
@@ -2156,19 +2202,19 @@ count as whitespace.
@noindent
These definitions are effectively the same:
-@example
+@smallexample
#define FOUR (2 + 2)
#define FOUR (2 + 2)
#define FOUR (2 /* two */ + 2)
-@end example
+@end smallexample
@noindent
but these are not:
-@example
+@smallexample
#define FOUR (2 + 2)
#define FOUR ( 2+2 )
#define FOUR (2 * 2)
#define FOUR(score,and,seven,years,ago) (2 + 2)
-@end example
+@end smallexample
If a macro is redefined with a definition that is not effectively the
same as the old one, the preprocessor issues a warning and changes the
@@ -2249,25 +2295,25 @@ the input file, for more macro calls. It is possible to piece together
a macro call coming partially from the macro body and partially from the
arguments. For example,
-@example
+@smallexample
#define twice(x) (2*(x))
#define call_with_1(x) x(1)
call_with_1 (twice)
@expansion{} twice(1)
@expansion{} (2*(1))
-@end example
+@end smallexample
Macro definitions do not have to have balanced parentheses. By writing
an unbalanced open parenthesis in a macro body, it is possible to create
a macro call that begins inside the macro body but ends outside of it.
For example,
-@example
+@smallexample
#define strange(file) fprintf (file, "%s %d",
@dots{}
strange(stderr) p, 35)
@expansion{} fprintf (stderr, "%s %d", p, 35)
-@end example
+@end smallexample
The ability to piece together a macro call can be useful, but the use of
unbalanced open parentheses in a macro body is just confusing, and
@@ -2285,41 +2331,41 @@ way.
Suppose you define a macro as follows,
-@example
+@smallexample
#define ceil_div(x, y) (x + y - 1) / y
-@end example
+@end smallexample
@noindent
whose purpose is to divide, rounding up. (One use for this operation is
to compute how many @code{int} objects are needed to hold a certain
number of @code{char} objects.) Then suppose it is used as follows:
-@example
+@smallexample
a = ceil_div (b & c, sizeof (int));
@expansion{} a = (b & c + sizeof (int) - 1) / sizeof (int);
-@end example
+@end smallexample
@noindent
This does not do what is intended. The operator-precedence rules of
C make it equivalent to this:
-@example
+@smallexample
a = (b & (c + sizeof (int) - 1)) / sizeof (int);
-@end example
+@end smallexample
@noindent
What we want is this:
-@example
+@smallexample
a = ((b & c) + sizeof (int) - 1)) / sizeof (int);
-@end example
+@end smallexample
@noindent
Defining the macro as
-@example
+@smallexample
#define ceil_div(x, y) ((x) + (y) - 1) / (y)
-@end example
+@end smallexample
@noindent
provides the desired result.
@@ -2329,9 +2375,9 @@ ceil_div(1, 2)}. That has the appearance of a C expression that would
compute the size of the type of @code{ceil_div (1, 2)}, but in fact it
means something very different. Here is what it expands to:
-@example
+@smallexample
sizeof ((1) + (2) - 1) / (2)
-@end example
+@end smallexample
@noindent
This would take the size of an integer and divide it by two. The
@@ -2341,9 +2387,9 @@ was intended to be inside.
Parentheses around the entire macro definition prevent such problems.
Here, then, is the recommended way to define @code{ceil_div}:
-@example
+@smallexample
#define ceil_div(x, y) (((x) + (y) - 1) / (y))
-@end example
+@end smallexample
@node Swallowing the Semicolon
@subsection Swallowing the Semicolon
@@ -2354,13 +2400,13 @@ statement. Consider, for example, the following macro, that advances a
pointer (the argument @code{p} says where to find it) across whitespace
characters:
-@example
+@smallexample
#define SKIP_SPACES(p, limit) \
@{ char *lim = (limit); \
while (p < lim) @{ \
if (*p++ != ' ') @{ \
p--; break; @}@}@}
-@end example
+@end smallexample
@noindent
Here backslash-newline is used to split the macro definition, which must
@@ -2377,11 +2423,11 @@ like a function call, writing a semicolon afterward, as in
This can cause trouble before @code{else} statements, because the
semicolon is actually a null statement. Suppose you write
-@example
+@smallexample
if (*p != 0)
SKIP_SPACES (p, lim);
else @dots{}
-@end example
+@end smallexample
@noindent
The presence of two statements---the compound statement and a null
@@ -2391,20 +2437,20 @@ makes invalid C code.
The definition of the macro @code{SKIP_SPACES} can be altered to solve
this problem, using a @code{do @dots{} while} statement. Here is how:
-@example
+@smallexample
#define SKIP_SPACES(p, limit) \
do @{ char *lim = (limit); \
while (p < lim) @{ \
if (*p++ != ' ') @{ \
p--; break; @}@}@} \
while (0)
-@end example
+@end smallexample
Now @code{SKIP_SPACES (p, lim);} expands into
-@example
+@smallexample
do @{@dots{}@} while (0);
-@end example
+@end smallexample
@noindent
which is one statement. The loop executes exactly once; most compilers
@@ -2417,23 +2463,23 @@ generate no extra code for it.
@cindex unsafe macros
Many C programs define a macro @code{min}, for ``minimum'', like this:
-@example
+@smallexample
#define min(X, Y) ((X) < (Y) ? (X) : (Y))
-@end example
+@end smallexample
When you use this macro with an argument containing a side effect,
as shown here,
-@example
+@smallexample
next = min (x + y, foo (z));
-@end example
+@end smallexample
@noindent
it expands as follows:
-@example
+@smallexample
next = ((x + y) < (foo (z)) ? (x + y) : (foo (z)));
-@end example
+@end smallexample
@noindent
where @code{x + y} has been substituted for @code{X} and @code{foo (z)}
@@ -2451,12 +2497,12 @@ computes the value of @code{foo (z)} only once. The C language offers
no standard way to do this, but it can be done with GNU extensions as
follows:
-@example
+@smallexample
#define min(X, Y) \
(@{ typeof (X) x_ = (X); \
typeof (Y) y_ = (Y); \
(x_ < y_) ? x_ : y_; @})
-@end example
+@end smallexample
The @samp{(@{ @dots{} @})} notation produces a compound statement that
acts as an expression. Its value is the value of its last statement.
@@ -2470,7 +2516,7 @@ careful when @emph{using} the macro @code{min}. For example, you can
calculate the value of @code{foo (z)}, save it in a variable, and use
that variable in @code{min}:
-@example
+@smallexample
@group
#define min(X, Y) ((X) < (Y) ? (X) : (Y))
@dots{}
@@ -2479,7 +2525,7 @@ that variable in @code{min}:
next = min (x + y, tem);
@}
@end group
-@end example
+@end smallexample
@noindent
(where we assume that @code{foo} returns type @code{int}).
@@ -2493,11 +2539,11 @@ definition. Recall that all macro definitions are rescanned for more
macros to replace. If the self-reference were considered a use of the
macro, it would produce an infinitely large expansion. To prevent this,
the self-reference is not considered a macro call. It is passed into
-the preprocessor output unchanged. Let's consider an example:
+the preprocessor output unchanged. Consider an example:
-@example
+@smallexample
#define foo (4 + foo)
-@end example
+@end smallexample
@noindent
where @code{foo} is also a variable in your program.
@@ -2520,9 +2566,9 @@ of the variable @code{foo}, whereas in fact the value is four greater.
One common, useful use of self-reference is to create a macro which
expands to itself. If you write
-@example
+@smallexample
#define EPERM EPERM
-@end example
+@end smallexample
@noindent
then the macro @code{EPERM} expands to @code{EPERM}. Effectively, it is
@@ -2536,15 +2582,15 @@ If a macro @code{x} expands to use a macro @code{y}, and the expansion of
self-reference} of @code{x}. @code{x} is not expanded in this case
either. Thus, if we have
-@example
+@smallexample
#define x (4 + y)
#define y (2 * x)
-@end example
+@end smallexample
@noindent
then @code{x} and @code{y} expand as follows:
-@example
+@smallexample
@group
x @expansion{} (4 + y)
@expansion{} (4 + (2 * x))
@@ -2552,7 +2598,7 @@ x @expansion{} (4 + y)
y @expansion{} (2 * x)
@expansion{} (2 * (4 + y))
@end group
-@end example
+@end smallexample
@noindent
Each macro is expanded when it appears in the definition of the other
@@ -2613,12 +2659,12 @@ concatenate its expansion, you can do that by causing one macro to call
another macro that does the stringification or concatenation. For
instance, if you have
-@example
+@smallexample
#define AFTERX(x) X_ ## x
#define XAFTERX(x) AFTERX(x)
#define TABLESIZE 1024
#define BUFSIZE TABLESIZE
-@end example
+@end smallexample
then @code{AFTERX(BUFSIZE)} expands to @code{X_BUFSIZE}, and
@code{XAFTERX(BUFSIZE)} expands to @code{X_1024}. (Not to
@@ -2630,11 +2676,11 @@ Macros used in arguments, whose expansions contain unshielded commas.
This can cause a macro expanded on the second scan to be called with the
wrong number of arguments. Here is an example:
-@example
+@smallexample
#define foo a,b
#define bar(x) lose(x)
#define lose(x) (1 + (x))
-@end example
+@end smallexample
We would like @code{bar(foo)} to turn into @code{(1 + (foo))}, which
would then turn into @code{(1 + (a,b))}. Instead, @code{bar(foo)}
@@ -2643,11 +2689,11 @@ requires a single argument. In this case, the problem is easily solved
by the same parentheses that ought to be used to prevent misnesting of
arithmetic operations:
-@example
+@smallexample
#define foo (a,b)
@exdent or
#define bar(x) lose((x))
-@end example
+@end smallexample
The extra pair of parentheses prevents the comma in @code{foo}'s
definition from being interpreted as an argument separator.
@@ -2666,13 +2712,13 @@ different to the line containing the argument causing the problem.
Here is an example illustrating this:
-@example
+@smallexample
#define ignore_second_arg(a,b,c) a; c
ignore_second_arg (foo (),
ignored (),
syntax error);
-@end example
+@end smallexample
@noindent
The syntax error triggered by the tokens @code{syntax error} results in
@@ -2773,7 +2819,7 @@ directive}: @samp{#if}, @samp{#ifdef} or @samp{#ifndef}.
The simplest sort of conditional is
-@example
+@smallexample
@group
#ifdef @var{MACRO}
@@ -2781,7 +2827,7 @@ The simplest sort of conditional is
#endif /* @var{MACRO} */
@end group
-@end example
+@end smallexample
@cindex conditional group
This block is called a @dfn{conditional group}. @var{controlled text}
@@ -2854,7 +2900,7 @@ automated by a tool such as @command{autoconf}, or done by hand.
The @samp{#if} directive allows you to test the value of an arithmetic
expression, rather than the mere existence of one macro. Its syntax is
-@example
+@smallexample
@group
#if @var{expression}
@@ -2862,7 +2908,7 @@ expression, rather than the mere existence of one macro. Its syntax is
#endif /* @var{expression} */
@end group
-@end example
+@end smallexample
@var{expression} is a C expression of integer type, subject to stringent
restrictions. It may contain
@@ -2915,9 +2961,6 @@ expression, and may give different results in some cases. If the value
comes out to be nonzero, the @samp{#if} succeeds and the @var{controlled
text} is included; otherwise it is skipped.
-If @var{expression} is not correctly formed, GCC issues an error and
-treats the conditional as having failed.
-
@node Defined
@subsection Defined
@@ -2932,9 +2975,9 @@ defined MACRO}} is precisely equivalent to @code{@w{#ifdef MACRO}}.
@code{defined} is useful when you wish to test more than one macro for
existence at once. For example,
-@example
+@smallexample
#if defined (__vax__) || defined (__ns16000__)
-@end example
+@end smallexample
@noindent
would succeed if either of the names @code{__vax__} or
@@ -2942,9 +2985,9 @@ would succeed if either of the names @code{__vax__} or
Conditionals written like this:
-@example
+@smallexample
#if defined BUFSIZE && BUFSIZE >= 1024
-@end example
+@end smallexample
@noindent
can generally be simplified to just @code{@w{#if BUFSIZE >= 1024}},
@@ -2965,7 +3008,7 @@ The @samp{#else} directive can be added to a conditional to provide
alternative text to be used if the condition fails. This is what it
looks like:
-@example
+@smallexample
@group
#if @var{expression}
@var{text-if-true}
@@ -2973,7 +3016,7 @@ looks like:
@var{text-if-false}
#endif /* Not @var{expression} */
@end group
-@end example
+@end smallexample
@noindent
If @var{expression} is nonzero, the @var{text-if-true} is included and
@@ -2989,7 +3032,7 @@ You can use @samp{#else} with @samp{#ifdef} and @samp{#ifndef}, too.
One common case of nested conditionals is used to check for more than two
possible alternatives. For example, you might have
-@example
+@smallexample
#if X == 1
@dots{}
#else /* X != 1 */
@@ -2999,12 +3042,12 @@ possible alternatives. For example, you might have
@dots{}
#endif /* X != 2 */
#endif /* X != 1 */
-@end example
+@end smallexample
Another conditional directive, @samp{#elif}, allows this to be
abbreviated as follows:
-@example
+@smallexample
#if X == 1
@dots{}
#elif X == 2
@@ -3012,7 +3055,7 @@ abbreviated as follows:
#else /* X != 2 and X != 1*/
@dots{}
#endif /* X != 2 and X != 1*/
-@end example
+@end smallexample
@samp{#elif} stands for ``else if''. Like @samp{#else}, it goes in the
middle of a conditional group and subdivides it; it does not require a
@@ -3072,23 +3115,23 @@ combination of parameters which you know the program does not properly
support. For example, if you know that the program will not run
properly on a VAX, you might write
-@example
+@smallexample
@group
#ifdef __vax__
#error "Won't work on VAXen. See comments at get_last_object."
#endif
@end group
-@end example
+@end smallexample
If you have several configuration parameters that must be set up by
the installation in a consistent way, you can use conditionals to detect
an inconsistency and report it with @samp{#error}. For example,
-@example
+@smallexample
#if !defined(UNALIGNED_INT_ASM_OP) && defined(DWARF2_DEBUGGING_INFO)
#error "DWARF2_DEBUGGING_INFO requires UNALIGNED_INT_ASM_OP."
#endif
-@end example
+@end smallexample
@findex #warning
The directive @samp{#warning} is like @samp{#error}, but causes the
@@ -3222,18 +3265,18 @@ literal. It is destringized, by replacing all @samp{\\} with a single
processed as if it had appeared as the right hand side of a
@samp{#pragma} directive. For example,
-@example
+@smallexample
_Pragma ("GCC dependency \"parse.y\"")
-@end example
+@end smallexample
@noindent
has the same effect as @code{#pragma GCC dependency "parse.y"}. The
same effect could be achieved using macros, for example
-@example
+@smallexample
#define DO_PRAGMA(x) _Pragma (#x)
DO_PRAGMA (GCC dependency "parse.y")
-@end example
+@end smallexample
The standard is unclear on where a @code{_Pragma} operator can appear.
The preprocessor does not accept it within a preprocessing conditional
@@ -3255,10 +3298,10 @@ other file is searched for using the normal include search path.
Optional trailing text can be used to give more information in the
warning message.
-@example
+@smallexample
#pragma GCC dependency "parse.y"
#pragma GCC dependency "/usr/include/time.h" rerun fixincludes
-@end example
+@end smallexample
@item #pragma GCC poison
Sometimes, there is an identifier that you want to remove completely
@@ -3268,10 +3311,10 @@ enforce this, you can @dfn{poison} the identifier with this pragma.
poison. If any of those identifiers appears anywhere in the source
after the directive, it is a hard error. For example,
-@example
+@smallexample
#pragma GCC poison printf sprintf fprintf
sprintf(some_string, "hello");
-@end example
+@end smallexample
@noindent
will produce an error.
@@ -3283,11 +3326,11 @@ about system headers defining macros that use it.
For example,
-@example
+@smallexample
#define strrchr rindex
#pragma GCC poison rindex
strrchr(some_string, 'h');
-@end example
+@end smallexample
@noindent
will not produce an error.
@@ -3356,9 +3399,9 @@ necessary to prevent an accidental token paste.
Source file name and line number information is conveyed by lines
of the form
-@example
+@smallexample
# @var{linenum} @var{filename} @var{flags}
-@end example
+@end smallexample
@noindent
These are called @dfn{linemarkers}. They are inserted as needed into
@@ -3706,8 +3749,30 @@ and stick to it.
@item The mapping of physical source file multi-byte characters to the
execution character set.
-Currently, GNU cpp only supports character sets that are strict supersets
-of ASCII, and performs no translation of characters.
+Currently, CPP requires its input to be ASCII or UTF-8. The execution
+character set may be controlled by the user, with the
+@code{-ftarget-charset} and @code{-ftarget-wide-charset} options.
+
+@item Identifier characters.
+@anchor{Identifier characters}
+
+The C and C++ standards allow identifiers to be composed of @samp{_}
+and the alphanumeric characters. C++ and C99 also allow universal
+character names (not implemented in GCC), and C99 further permits
+implementation-defined characters.
+
+GCC allows the @samp{$} character in identifiers as an extension for
+most targets. This is true regardless of the @option{std=} switch,
+since this extension cannot conflict with standards-conforming
+programs. When preprocessing assembler, however, dollars are not
+identifier characters by default.
+
+Currently the targets that by default do not permit @samp{$} are AVR,
+IP2K, MMIX, MIPS Irix 3, ARM aout, and PowerPC targets for the AIX and
+BeOS operating systems.
+
+You can override the default with @option{-fdollars-in-identifiers} or
+@option{fno-dollars-in-identifiers}. @xref{fdollars-in-identifiers}.
@item Non-empty sequences of whitespace characters.
@@ -3765,10 +3830,10 @@ pragmas.
CPP has a small number of internal limits. This section lists the
limits which the C standard requires to be no lower than some minimum,
-and all the others we are aware of. We intend there to be as few limits
+and all the others known. It is intended that there should be as few limits
as possible. If you encounter an undocumented or inconvenient limit,
-please report that to us as a bug. (See the section on reporting bugs in
-the GCC manual.)
+please report that as a bug. @xref{Bugs, , Reporting Bugs, gcc, Using
+the GNU Compiler Collection (GCC)}.
Where we say something is limited @dfn{only by available memory}, that
means that internal data structures impose no intrinsic limit, and space
@@ -3857,9 +3922,9 @@ all.
@cindex predicates
An assertion looks like this:
-@example
+@smallexample
#@var{predicate} (@var{answer})
-@end example
+@end smallexample
@noindent
@var{predicate} must be a single identifier. @var{answer} can be any
@@ -3875,26 +3940,26 @@ To test an assertion, you write it in an @samp{#if}. For example, this
conditional succeeds if either @code{vax} or @code{ns16000} has been
asserted as an answer for @code{machine}.
-@example
+@smallexample
#if #machine (vax) || #machine (ns16000)
-@end example
+@end smallexample
@noindent
You can test whether @emph{any} answer is asserted for a predicate by
omitting the answer in the conditional:
-@example
+@smallexample
#if #machine
-@end example
+@end smallexample
@findex #assert
Assertions are made with the @samp{#assert} directive. Its sole
argument is the assertion to make, without the leading @samp{#} that
identifies assertions in conditionals.
-@example
+@smallexample
#assert @var{predicate} (@var{answer})
-@end example
+@end smallexample
@noindent
You may make several assertions with the same predicate and different
@@ -3910,9 +3975,9 @@ answer which was specified on the @samp{#unassert} line; other answers
for that predicate remain true. You can cancel an entire predicate by
leaving out the answer:
-@example
+@smallexample
#unassert @var{predicate}
-@end example
+@end smallexample
@noindent
In either form, if no such assertion has been made, @samp{#unassert} has
@@ -4066,7 +4131,9 @@ without notice.
cpp [@option{-D}@var{macro}[=@var{defn}]@dots{}] [@option{-U}@var{macro}]
[@option{-I}@var{dir}@dots{}] [@option{-W}@var{warn}@dots{}]
[@option{-M}|@option{-MM}] [@option{-MG}] [@option{-MF} @var{filename}]
- [@option{-MP}] [@option{-MQ} @var{target}@dots{}] [@option{-MT} @var{target}@dots{}]
+ [@option{-MP}] [@option{-MQ} @var{target}@dots{}]
+ [@option{-MT} @var{target}@dots{}]
+ [@option{-P}] [@option{-fno-working-directory}]
[@option{-x} @var{language}] [@option{-std=}@var{standard}]
@var{infile} @var{outfile}
@@ -4119,7 +4186,7 @@ Note that you can also specify places to search using options such as
@option{-M} (@pxref{Invocation}). These take precedence over
environment variables, which in turn take precedence over the
configuration of GCC@.
-
+
@include cppenv.texi
@c man end
OpenPOWER on IntegriCloud