diff options
Diffstat (limited to 'contrib/gcc/f/ffe.texi')
-rw-r--r-- | contrib/gcc/f/ffe.texi | 2024 |
1 files changed, 2024 insertions, 0 deletions
diff --git a/contrib/gcc/f/ffe.texi b/contrib/gcc/f/ffe.texi new file mode 100644 index 0000000..e303332 --- /dev/null +++ b/contrib/gcc/f/ffe.texi @@ -0,0 +1,2024 @@ +@c Copyright (C) 1999 Free Software Foundation, Inc. +@c This is part of the G77 manual. +@c For copying conditions, see the file g77.texi. + +@node Front End +@chapter Front End +@cindex GNU Fortran Front End (FFE) +@cindex FFE +@cindex @code{g77}, front end +@cindex front end, @code{g77} + +This chapter describes some aspects of the design and implementation +of the @code{g77} front end. +Much of the information below applies not to current +releases of @code{g77}, +but to the 0.6 rewrite being designed and implemented +as of late May, 1999. + +To find about things that are ``To Be Determined'' or ``To Be Done'', +search for the string TBD. +If you want to help by working on one or more of these items, +email me at @email{@value{email-burley}}. +If you're planning to do more than just research issues and offer comments, +see @uref{http://www.gnu.org/software/contribute.html} for steps you might +need to take first. + +@menu +* Overview of Sources:: +* Overview of Translation Process:: +* Philosophy of Code Generation:: +* Two-pass Design:: +* Challenges Posed:: +* Transforming Statements:: +* Transforming Expressions:: +* Internal Naming Conventions:: +@end menu + +@node Overview of Sources +@section Overview of Sources + +The current directory layout includes the following: + +@table @file +@item @value{srcdir}/gcc/ +Non-g77 files in gcc + +@item @value{srcdir}/gcc/f/ +GNU Fortran front end sources + +@item @value{srcdir}/libf2c/ +@code{libg2c} configuration and @code{g2c.h} file generation + +@item @value{srcdir}/libf2c/libF77/ +General support and math portion of @code{libg2c} + +@item @value{srcdir}/libf2c/libI77/ +I/O portion of @code{libg2c} + +@item @value{srcdir}/libf2c/libU77/ +Additional interfaces to Unix @code{libc} for @code{libg2c} +@end table + +Components of note in @code{g77} are described below. + +@file{f/} as a whole contains the source for @code{g77}, +while @file{libf2c/} contains a portion of the separate program +@code{f2c}. +Note that the @code{libf2c} code is not part of the program @code{g77}, +just distributed with it. + +@file{f/} contains text files that document the Fortran compiler, source +files for the GNU Fortran Front End (FFE), and some other stuff. +The @code{g77} compiler code is placed in @file{f/} because it, +along with its contents, +is designed to be a subdirectory of a @code{gcc} source directory, +@file{gcc/}, +which is structured so that language-specific front ends can be ``dropped +in'' as subdirectories. +The C++ front end (@code{g++}), is an example of this---it resides in +the @file{cp/} subdirectory. +Note that the C front end (also referred to as @code{gcc}) +is an exception to this, as its source files reside +in the @file{gcc/} directory itself. + +@file{libf2c/} contains the run-time libraries for the @code{f2c} program, +also used by @code{g77}. +These libraries normally referred to collectively as @code{libf2c}. +When built as part of @code{g77}, +@code{libf2c} is installed under the name @code{libg2c} to avoid +conflict with any existing version of @code{libf2c}, +and thus is often referred to as @code{libg2c} when the +@code{g77} version is specifically being referred to. + +The @code{netlib} version of @code{libf2c/} +contains two distinct libraries, +@code{libF77} and @code{libI77}, +each in their own subdirectories. +In @code{g77}, this distinction is not made, +beyond maintaining the subdirectory structure in the source-code tree. + +@file{libf2c/} is not part of the program @code{g77}, +just distributed with it. +It contains files not present +in the official (@code{netlib}) version of @code{libf2c}, +and also contains some minor changes made from @code{libf2c}, +to fix some bugs, +and to facilitate automatic configuration, building, and installation of +@code{libf2c} (as @code{libg2c}) for use by @code{g77} users. +See @file{libf2c/README} for more information, +including licensing conditions +governing distribution of programs containing code from @code{libg2c}. + +@code{libg2c}, @code{g77}'s version of @code{libf2c}, +adds Dave Love's implementation of @code{libU77}, +in the @file{libf2c/libU77/} directory. +This library is distributed under the +GNU Library General Public License (LGPL)---see the +file @file{libf2c/libU77/COPYING.LIB} +for more information, +as this license +governs distribution conditions for programs containing code +from this portion of the library. + +Files of note in @file{f/} and @file{libf2c/} are described below: + +@table @file +@item f/BUGS +Lists some important bugs known to be in g77. +Or use Info (or GNU Emacs Info mode) to read +the ``Actual Bugs'' node of the @code{g77} documentation: + +@smallexample +info -f f/g77.info -n "Actual Bugs" +@end smallexample + +@item f/ChangeLog +Lists recent changes to @code{g77} internals. + +@item libf2c/ChangeLog +Lists recent changes to @code{libg2c} internals. + +@item f/NEWS +Contains the per-release changes. +These include the user-visible +changes described in the node ``Changes'' +in the @code{g77} documentation, plus internal +changes of import. +Or use: + +@smallexample +info -f f/g77.info -n News +@end smallexample + +@item f/g77.info* +The @code{g77} documentation, in Info format, +produced by building @code{g77}. + +All users of @code{g77} (not just installers) should read this, +using the @code{more} command if neither the @code{info} command, +nor GNU Emacs (with its Info mode), are available, or if users +aren't yet accustomed to using these tools. +All of these files are readable as ``plain text'' files, +though they're easier to navigate using Info readers +such as @code{info} and GNU Emacs Info mode. +@end table + +If you want to explore the FFE code, which lives entirely in @file{f/}, +here are a few clues. +The file @file{g77spec.c} contains the @code{g77}-specific source code +for the @code{g77} command only---this just forms a variant of the +@code{gcc} command, so, +just as the @code{gcc} command itself does not contain the C front end, +the @code{g77} command does not contain the Fortran front end (FFE). +The FFE code ends up in an executable named @file{f771}, +which does the actual compiling, +so it contains the FFE plus the @code{gcc} back end (GBE), +the latter to do most of the optimization, and the code generation. + +The file @file{parse.c} is the source file for @code{yyparse()}, +which is invoked by the GBE to start the compilation process, +for @file{f771}. + +The file @file{top.c} contains the top-level FFE function @code{ffe_file} +and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*}, +and @samp{FFE_[A-Za-z].*} symbols. + +The file @file{fini.c} is a @code{main()} program that is used when building +the FFE to generate C header and source files for recognizing keywords. +The files @file{malloc.c} and @file{malloc.h} comprise a memory manager +that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and +@samp{MALLOC_[A-Za-z].*} symbols. + +All other modules named @var{xyz} +are comprised of all files named @samp{@var{xyz}*.@var{ext}} +and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*}, +and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols. +If you understand all this, congratulations---it's easier for me to remember +how it works than to type in these regular expressions. +But it does make it easy to find where a symbol is defined. +For example, the symbol @samp{ffexyz_set_something} would be defined +in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}. + +The ``porting'' files of note currently are: + +@table @file +@item proj.c +@itemx proj.h +This defines the ``language'' used by all the other source files, +the language being Standard C plus some useful things +like @code{ARRAY_SIZE} and such. + +@item target.c +@itemx target.h +These describe the target machine +in terms of what data types are supported, +how they are denoted +(to what C type does an @code{INTEGER*8} map, for example), +how to convert between them, +and so on. +Over time, versions of @code{g77} rely less on this file +and more on run-time configuration based on GBE info +in @file{com.c}. + +@item com.c +@itemx com.h +These are the primary interface to the GBE. + +@item ste.c +@itemx ste.h +This contains code for implementing recognized executable statements +in the GBE. + +@item src.c +@itemx src.h +These contain information on the format(s) of source files +(such as whether they are never to be processed as case-insensitive +with regard to Fortran keywords). +@end table + +If you want to debug the @file{f771} executable, +for example if it crashes, +note that the global variables @code{lineno} and @code{input_filename} +are usually set to reflect the current line being read by the lexer +during the first-pass analysis of a program unit and to reflect +the current line being processed during the second-pass compilation +of a program unit. + +If an invocation of the function @code{ffestd_exec_end} is on the stack, +the compiler is in the second pass, otherwise it is in the first. + +(This information might help you reduce a test case and/or work around +a bug in @code{g77} until a fix is available.) + +@node Overview of Translation Process +@section Overview of Translation Process + +The order of phases translating source code to the form accepted +by the GBE is: + +@enumerate +@item +Stripping punched-card sources (@file{g77stripcard.c}) + +@item +Lexing (@file{lex.c}) + +@item +Stand-alone statement identification (@file{sta.c}) + +@item +Parsing (@file{stb.c} and @file{expr.c}) + +@item +Constructing (@file{stc.c}) + +@item +Collecting (@file{std.c}) + +@item +Expanding (@file{ste.c}) +@end enumerate + +To get a rough idea of how a particularly twisted Fortran statement +gets treated by the passes, consider: + +@smallexample + FORMAT(I2 4H)=(J/ + & I3) +@end smallexample + +The job of @file{lex.c} is to know enough about Fortran syntax rules +to break the statement up into distinct lexemes without requiring +any feedback from subsequent phases: + +@smallexample +`FORMAT' +`(' +`I24H' +`)' +`=' +`(' +`J' +`/' +`I3' +`)' +@end smallexample + +The job of @file{sta.c} is to figure out the kind of statement, +or, at least, statement form, that sequence of lexemes represent. + +The sooner it can do this (in terms of using the smallest number of +lexemes, starting with the first for each statement), the better, +because that leaves diagnostics for problems beyond the recognition +of the statement form to subsequent phases, +which can usually better describe the nature of the problem. + +In this case, the @samp{=} at ``level zero'' +(not nested within parentheses) +tells @file{sta.c} that this is an @emph{assignment-form}, +not @code{FORMAT}, statement. + +An assignment-form statement might be a statement-function +definition or an executable assignment statement. + +To make that determination, +@file{sta.c} looks at the first two lexemes. + +Since the second lexeme is @samp{(}, +the first must represent an array for this to be an assignment statement, +else it's a statement function. + +Either way, @file{sta.c} hands off the statement to @file{stb.c} +(either its statement-function parser or its assignment-statement parser). + +@file{stb.c} forms a +statement-specific record containing the pertinent information. +That information includes a source expression and, +for an assignment statement, a destination expression. +Expressions are parsed by @file{expr.c}. + +This record is passed to @file{stc.c}, +which copes with the implications of the statement +within the context established by previous statements. + +For example, if it's the first statement in the file +or after an @code{END} statement, +@file{stc.c} recognizes that, first of all, +a main program unit is now being lexed +(and tells that to @file{std.c} +before telling it about the current statement). + +@file{stc.c} attaches whatever information it can, +usually derived from the context established by the preceding statements, +and passes the information to @file{std.c}. + +@file{std.c} saves this information away, +since the GBE cannot cope with information +that might be incomplete at this stage. + +For example, @samp{I3} might later be determined +to be an argument to an alternate @code{ENTRY} point. + +When @file{std.c} is told about the end of an external (top-level) +program unit, +it passes all the information it has saved away +on statements in that program unit +to @file{ste.c}. + +@file{ste.c} ``expands'' each statement, in sequence, by +constructing the appropriate GBE information and calling +the appropriate GBE routines. + +Details on the transformational phases follow. +Keep in mind that Fortran numbering is used, +so the first character on a line is column 1, +decimal numbering is used, and so on. + +@menu +* g77stripcard:: +* lex.c:: +* sta.c:: +* stb.c:: +* expr.c:: +* stc.c:: +* std.c:: +* ste.c:: + +* Gotchas (Transforming):: +* TBD (Transforming):: +@end menu + +@node g77stripcard +@subsection g77stripcard + +The @code{g77stripcard} program handles removing content beyond +column 72 (adjustable via a command-line option), +optionally warning about that content being something other +than trailing whitespace or Fortran commentary. + +This program is needed because @code{lex.c} doesn't pay attention +to maximum line lengths at all, to make it easier to maintain, +as well as faster (for sources that don't depend on the maximum +column length vis-a-vis trailing non-blank non-commentary content). + +Just how this program will be run---whether automatically for +old source (perhaps as the default for @file{.f} files?)---is not +yet determined. + +In the meantime, it might as well be implemented as a typical UNIX pipe. + +It should accept a @samp{-fline-length-@var{n}} option, +with the default line length set to 72. + +When the text it strips off the end of a line is not blank +(not spaces and tabs), +it should insert an additional comment line +(beginning with @samp{!}, +so it works for both fixed-form and free-form files) +containing the text, +following the stripped line. +The inserted comment should have a prefix of some kind, +TBD, that distinguishes the comment as representing stripped text. +Users could use that to @code{sed} out such lines, if they wished---it +seems silly to provide a command-line option to delete information +when it can be so easily filtered out by another program. + +(This inserted comment should be designed to ``fit in'' well +with whatever the Fortran community is using these days for +preprocessor, translator, and other such products, like OpenMP. +What that's all about, and how @code{g77} can elegantly fit its +special comment conventions into it all, is TBD as well. +We don't want to reinvent the wheel here, but if there turn out +to be too many conflicting conventions, we might have to invent +one that looks nothing like the others, but which offers their +host products a better infrastructure in which to fit and coexist +peacefully.) + +@code{g77stripcard} probably shouldn't do any tab expansion or other +fancy stuff. +People can use @code{expand} or other pre-filtering if they like. +The idea here is to keep each stage quite simple, while providing +excellent performance for ``normal'' code. + +(Code with junk beyond column 73 is not really ``normal'', +as it comes from a card-punch heritage, +and will be increasingly hard for tomorrow's Fortran programmers to read.) + +@node lex.c +@subsection lex.c + +To help make the lexer simple, fast, and easy to maintain, +while also having @code{g77} generally encourage Fortran programmers +to write simple, maintainable, portable code by maximizing the +performance of compiling that kind of code: + +@itemize @bullet +@item +There'll be just one lexer, for both fixed-form and free-form source. + +@item +It'll care about the form only when handling the first 7 columns of +text, stuff like spaces between strings of alphanumerics, and +how lines are continued. + +Some other distinctions will be handled by subsequent phases, +so at least one of them will have to know which form is involved. + +For example, @samp{I = 2 . 4} is acceptable in fixed form, +and works in free form as well given the implementation @code{g77} +presently uses. +But the standard requires a diagnostic for it in free form, +so the parser has to be able to recognize that +the lexemes aren't contiguous +(information the lexer @emph{does} have to provide) +and that free-form source is being parsed, +so it can provide the diagnostic. + +The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme. +Otherwise, it'd have to know a whole lot more about how to parse Fortran, +or subsequent phases (mainly parsing) would have two paths through +lots of critical code---one to handle the lexeme @samp{2}, @samp{.}, +and @samp{4} in sequence, another to handle the lexeme @samp{2.4}. + +@item +It won't worry about line lengths +(beyond the first 7 columns for fixed-form source). + +That is, once it starts parsing the ``statement'' part of a line +(column 7 for fixed-form, column 1 for free-form), +it'll keep going until it finds a newline, +rather than ignoring everything past a particular column +(72 or 132). + +The implication here is that there shouldn't @emph{be} +anything past that last column, other than whitespace or +commentary, because users using typical editors +(or viewing output as typically printed) +won't necessarily know just where the last column is. + +Code that has ``garbage'' beyond the last column +(almost certainly only fixed-form code with a punched-card legacy, +such as code using columns 73-80 for ``sequence numbers'') +will have to be run through @code{g77stripcard} first. + +Also, keeping track of the maximum column position while also watching out +for the end of a line @emph{and} while reading from a file +just makes things slower. +Since a file must be read, and watching for the end of the line +is necessary (unless the typical input file was preprocessed to +include the necessary number of trailing spaces), +dropping the tracking of the maximum column position +is the only way to reduce the complexity of the pertinent code +while maintaining high performance. + +@item +ASCII encoding is assumed for the input file. + +Code written in other character sets will have to be converted first. + +@item +Tabs (ASCII code 9) +will be converted to spaces via the straightforward +approach. + +Specifically, a tab is converted to between one and eight spaces +as necessary to reach column @var{n}, +where dividing @samp{(@var{n} - 1)} by eight +results in a remainder of zero. + +@item +Linefeeds (ASCII code 10) +mark the ends of lines. + +@item +A carriage return (ASCII code 13) +is accept if it immediately precedes a linefeed, +in which case it is ignored. + +Otherwise, it is rejected (with a diagnostic). + +@item +Any other characters other than the above +that are not part of the GNU Fortran Character Set +(@pxref{Character Set}) +are rejected with a diagnostic. + +This includes backspaces, form feeds, and the like. + +(It might make sense to allow a form feed in column 1 +as long as that's the only character on a line. +It certainly wouldn't seem to cost much in terms of performance.) + +@item +The end of the input stream (EOF) +ends the current line. + +@item +The distinction between uppercase and lowercase letters +will be preserved. + +It will be up to subsequent phases to decide to fold case. + +Current plans are to permit any casing for Fortran (reserved) keywords +while preserving casing for user-defined names. +(This might not be made the default for @file{.f} files, though.) + +Preserving case seems necessary to provide more direct access +to facilities outside of @code{g77}, such as to C or Pascal code. + +Names of intrinsics will probably be matchable in any case, +However, there probably won't be any option to require +a particular mixed-case appearance of intrinsics +(as there was for @code{g77} prior to version 0.6), +because that's painful to maintain, +and probably nobody uses it. + +(How @samp{external SiN; r = sin(x)} would be handled is TBD. +I think old @code{g77} might already handle that pretty elegantly, +but whether we can cope with allowing the same fragment to reference +a @emph{different} procedure, even with the same interface, +via @samp{s = SiN(r)}, needs to be determined. +If it can't, we need to make sure that when code introduces +a user-defined name, any intrinsic matching that name +using a case-insensitive comparison +is ``turned off''.) + +@item +Backslashes in @code{CHARACTER} and Hollerith constants +are not allowed. + +This avoids the confusion introduced by some Fortran compiler vendors +providing C-like interpretation of backslashes, +while others provide straight-through interpretation. + +Some kind of lexical construct (TBD) will be provided to allow +flagging of a @code{CHARACTER} +(but probably not a Hollerith) +constant that permits backslashes. +It'll necessarily be a prefix, such as: + +@smallexample +PRINT *, C'This line has a backspace \b here.' +PRINT *, F'This line has a straight backslash \ here.' +@end smallexample + +Further, command-line options might be provided to specify that +one prefix or the other is to be assumed as the default +for @code{CHARACTER} constants. + +However, it seems more helpful for @code{g77} to provide a program +that converts prefix all constants +(or just those containing backslashes) +with the desired designation, +so printouts of code can be read +without knowing the compile-time options used when compiling it. + +If such a program is provided +(let's name it @code{g77slash} for now), +then a command-line option to @code{g77} should not be provided. +(Though, given that it'll be easy to implement, it might be hard +to resist user requests for it ``to compile faster than if we +have to invoke another filter''.) + +This program would take a command-line option to specify the +default interpretation of slashes, +affecting which prefix it uses for constants. + +@code{g77slash} probably should automatically convert Hollerith +constants that contain slashes +to the appropriate @code{CHARACTER} constants. +Then @code{g77} wouldn't have to define a prefix syntax for Hollerith +constants specifying whether they want C-style or straight-through +backslashes. +@end itemize + +The above implements nearly exactly what is specified by +@ref{Character Set}, +and +@ref{Lines}, +except it also provides automatic conversion of tabs +and ignoring of newline-related carriage returns. + +It also effects the ``pure visual'' model, +by which is meant that a user viewing his code +in a typical text editor +(assuming it's not preprocessed via @code{g77stripcard} or similar) +doesn't need any special knowledge +of whether spaces on the screen are really tabs, +whether lines end immediately after the last visible non-space character +or after a number of spaces and tabs that follow it, +or whether the last line in the file is ended by a newline. + +Most editors don't make these distinctions, +the ANSI FORTRAN 77 standard doesn't require them to, +and it permits a standard-conforming compiler +to define a method for transforming source code to +``standard form'' however it wants. + +So, GNU Fortran defines it such that users have the best chance +of having the code be interpreted the way it looks on the screen +of the typical editor. + +(Fancy editors should @emph{never} be required to correctly read code +written in classic two-dimensional-plaintext form. +By correct reading I mean ability to read it, book-like, without +mistaking text ignored by the compiler for program code and vice versa, +and without having to count beyond the first several columns. +The vague meaning of ASCII TAB, among other things, complicates +this somewhat, but as long as ``everyone'', including the editor, +other tools, and printer, agrees about the every-eighth-column convention, +the GNU Fortran ``pure visual'' model meets these requirements. +Any language or user-visible source form +requiring special tagging of tabs, +the ends of lines after spaces/tabs, +and so on, is broken by this definition. +Fortunately, Fortran @emph{itself} is not broken, +even if most vendor-supplied defaults for their Fortran compilers @emph{are} +in this regard.) + +Further, this model provides a clean interface +to whatever preprocessors or code-generators are used +to produce input to this phase of @code{g77}. +Mainly, they need not worry about long lines. + +@node sta.c +@subsection sta.c + +@node stb.c +@subsection stb.c + +@node expr.c +@subsection expr.c + +@node stc.c +@subsection stc.c + +@node std.c +@subsection std.c + +@node ste.c +@subsection ste.c + +@node Gotchas (Transforming) +@subsection Gotchas (Transforming) + +This section is not about transforming ``gotchas'' into something else. +It is about the weirder aspects of transforming Fortran, +however that's defined, +into a more modern, canonical form. + +@subsubsection Multi-character Lexemes + +Each lexeme carries with it a pointer to where it appears in the source. + +To provide the ability for diagnostics to point to column numbers, +in addition to line numbers and names, +lexemes that represent more than one (significant) character +in the source code need, generally, +to provide pointers to where each @emph{character} appears in the source. + +This provides the ability to properly identify the precise location +of the problem in code like + +@smallexample +SUBROUTINE X +END +BLOCK DATA X +END +@end smallexample + +which, in fixed-form source, would result in single lexemes +consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}. +(The problem is that @samp{X} is defined twice, +so a pointer to the @samp{X} in the second definition, +as well as a follow-up pointer to the corresponding pointer in the first, +would be preferable to pointing to the beginnings of the statements.) + +This need also arises when parsing (and diagnosing) @code{FORMAT} +statements. + +Further, it arises when diagnosing +@code{FMT=} specifiers that contain constants +(or partial constants, or even propagated constants!) +in I/O statements, as in: + +@smallexample +PRINT '(I2, 3HAB)', J +@end smallexample + +(A pointer to the beginning of the prematurely-terminated Hollerith +constant, and/or to the close parenthese, is preferable to a pointer +to the open-parenthese or the apostrophe that precedes it.) + +Multi-character lexemes, which would seem to naturally include +at least digit strings, alphanumeric strings, @code{CHARACTER} +constants, and Hollerith constants, therefore need to provide +location information on each character. +(Maybe Hollerith constants don't, but it's unnecessary to except them.) + +The question then arises, what about @emph{other} multi-character lexemes, +such as @samp{**} and @samp{//}, +and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on? + +Turns out there's a need to identify the location of the second character +of these two-character lexemes. +For example, in @samp{I(/J) = K}, the slash needs to be diagnosed +as the problem, not the open parenthese. +Similarly, it is preferable to diagnose the second slash in +@samp{I = J // K} rather than the first, given the implicit typing +rules, which would result in the compiler disallowing the attempted +concatenation of two integers. +(Though, since that's more of a semantic issue, +it's not @emph{that} much preferable.) + +Even sequences that could be parsed as digit strings could use location info, +for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}. +(This probably will be parsed as a character string, +to be consistent with the parsing of @samp{Z'129A'}.) + +To avoid the hassle of recording the location of the second character, +while also preserving the general rule that each significant character +is distinctly pointed to by the lexeme that contains it, +it's best to simply not have any fixed-size lexemes +larger than one character. + +This new design is expected to make checking for two +@samp{*} lexemes in a row much easier than the old design, +so this is not much of a sacrifice. +It probably makes the lexer much easier to implement +than it makes the parser harder. + +@subsubsection Space-padding Lexemes + +Certain lexemes need to be padded with virtual spaces when the +end of the line (or file) is encountered. + +This is necessary in fixed form, to handle lines that don't +extend to column 72, assuming that's the line length in effect. + +@subsubsection Bizarre Free-form Hollerith Constants + +Last I checked, the Fortran 90 standard actually required the compiler +to silently accept something like + +@smallexample +FORMAT ( 1 2 Htwelve chars ) +@end smallexample + +as a valid @code{FORMAT} statement specifying a twelve-character +Hollerith constant. + +The implication here is that, since the new lexer is a zero-feedback one, +it won't know that the special case of a @code{FORMAT} statement being parsed +requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as +a single lexeme. + +(This is a horrible misfeature of the Fortran 90 language. +It's one of many such misfeatures that almost make me want +to not support them, and forge ahead with designing a new +``GNU Fortran'' language that has the features, +but not the misfeatures, of Fortran 90, +and provide utility programs to do the conversion automatically.) + +So, the lexer must gather distinct chunks of decimal strings into +a single lexeme in contexts where a single decimal lexeme might +start a Hollerith constant. + +(Which probably means it might as well do that all the time +for all multi-character lexemes, even in free-form mode, +leaving it to subsequent phases to pull them apart as they see fit.) + +Compare the treatment of this to how + +@smallexample +CHARACTER * 4 5 HEY +@end smallexample + +and + +@smallexample +CHARACTER * 12 HEY +@end smallexample + +must be treated---the former must be diagnosed, due to the separation +between lexemes, the latter must be accepted as a proper declaration. + +@subsubsection Hollerith Constants + +Recognizing a Hollerith constant---specifically, +that an @samp{H} or @samp{h} after a digit string begins +such a constant---requires some knowledge of context. + +Hollerith constants (such as @samp{2HAB}) can appear after: + +@itemize @bullet +@item +@samp{(} + +@item +@samp{,} + +@item +@samp{=} + +@item +@samp{+}, @samp{-}, @samp{/} + +@item +@samp{*}, except as noted below +@end itemize + +Hollerith constants don't appear after: + +@itemize @bullet +@item +@samp{CHARACTER*}, +which can be treated generally as +any @samp{*} that is the second lexeme of a statement +@end itemize + +@subsubsection Confusing Function Keyword + +While + +@smallexample +REAL FUNCTION FOO () +@end smallexample + +must be a @code{FUNCTION} statement and + +@smallexample +REAL FUNCTION FOO (5) +@end smallexample + +must be a type-definition statement, + +@smallexample +REAL FUNCTION FOO (@var{names}) +@end smallexample + +where @var{names} is a comma-separated list of names, +can be one or the other. + +The only way to disambiguate that statement +(short of mandating free-form source or a short maximum +length for name for external procedures) +is based on the context of the statement. + +In particular, the statement is known to be within an +already-started program unit +(but not at the outer level of the @code{CONTAINS} block), +it is a type-declaration statement. + +Otherwise, the statement is a @code{FUNCTION} statement, +in that it begins a function program unit +(external, or, within @code{CONTAINS}, nested). + +@subsubsection Weird READ + +The statement + +@smallexample +READ (N) +@end smallexample + +is equivalent to either + +@smallexample +READ (UNIT=(N)) +@end smallexample + +or + +@smallexample +READ (FMT=(N)) +@end smallexample + +depending on which would be valid in context. + +Specifically, if @samp{N} is type @code{INTEGER}, +@samp{READ (FMT=(N))} would not be valid, +because parentheses may not be used around @samp{N}, +whereas they may around it in @samp{READ (UNIT=(N))}. + +Further, if @samp{N} is type @code{CHARACTER}, +the opposite is true---@samp{READ (UNIT=(N))} is not valid, +but @samp{READ (FMT=(N))} is. + +Strictly speaking, if anything follows + +@smallexample +READ (N) +@end smallexample + +in the statement, whether the first lexeme after the close +parenthese is a comma could be used to disambiguate the two cases, +without looking at the type of @samp{N}, +because the comma is required for the @samp{READ (FMT=(N))} +interpretation and disallowed for the @samp{READ (UNIT=(N))} +interpretation. + +However, in practice, many Fortran compilers allow +the comma for the @samp{READ (UNIT=(N))} +interpretation anyway +(in that they generally allow a leading comma before +an I/O list in an I/O statement), +and much code takes advantage of this allowance. + +(This is quite a reasonable allowance, since the +juxtaposition of a comma-separated list immediately +after an I/O control-specification list, which is also comma-separated, +without an intervening comma, +looks sufficiently ``wrong'' to programmers +that they can't resist the itch to insert the comma. +@samp{READ (I, J), K, L} simply looks cleaner than +@samp{READ (I, J) K, L}.) + +So, type-based disambiguation is needed unless strict adherence +to the standard is always assumed, and we're not going to assume that. + +@node TBD (Transforming) +@subsection TBD (Transforming) + +Continue researching gotchas, designing the transformational process, +and implementing it. + +Specific issues to resolve: + +@itemize @bullet +@item +Just where should @code{INCLUDE} processing take place? + +Clearly before (or part of) statement identification (@file{sta.c}), +since determining whether @samp{I(J)=K} is a statement-function +definition or an assignment statement requires knowing the context, +which in turn requires having processed @code{INCLUDE} files. + +@item +Just where should (if it was implemented) @code{USE} processing take place? + +This gets into the whole issue of how @code{g77} should handle the concept +of modules. +I think GNAT already takes on this issue, but don't know more than that. +Jim Giles has written extensively on @code{comp.lang.fortran} +about his opinions on module handling, as have others. +Jim's views should be taken into account. + +Actually, Richard M. Stallman (RMS) also has written up +some guidelines for implementing such things, +but I'm not sure where I read them. +Perhaps the old @email{gcc2@@cygnus.com} list. + +If someone could dig references to these up and get them to me, +that would be much appreciated! +Even though modules are not on the short-term list for implementation, +it'd be helpful to know @emph{now} how to avoid making them harder to +implement them @emph{later}. + +@item +Should the @code{g77} command become just a script that invokes +all the various preprocessing that might be needed, +thus making it seem slower than necessary for legacy code +that people are unwilling to convert, +or should we provide a separate script for that, +thus encouraging people to convert their code once and for all? + +At least, a separate script to behave as old @code{g77} did, +perhaps named @code{g77old}, might ease the transition, +as might a corresponding one that converts source codes +named @code{g77oldnew}. + +These scripts would take all the pertinent options @code{g77} used +to take and run the appropriate filters, +passing the results to @code{g77} or just making new sources out of them +(in a subdirectory, leaving the user to do the dirty deed of +moving or copying them over the old sources). + +@item +Do other Fortran compilers provide a prefix syntax +to govern the treatment of backslashes in @code{CHARACTER} +(or Hollerith) constants? + +Knowing what other compilers provide would help. + +@item +Is it okay to drop support for the @samp{-fintrin-case-initcap}, +@samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap}, +and @samp{-fcase-initcap} options? + +I've asked @email{info-gnu-fortran@@gnu.org} for input on this. +Not having to support these makes it easier to write the new front end, +and might also avoid complicated its design. +@end itemize + +@node Philosophy of Code Generation +@section Philosophy of Code Generation + +Don't poke the bear. + +The @code{g77} front end generates code +via the @code{gcc} back end. + +@cindex GNU Back End (GBE) +@cindex GBE +@cindex @code{gcc}, back end +@cindex back end, gcc +@cindex code generator +The @code{gcc} back end (GBE) is a large, complex +labyrinth of intricate code +written in a combination of the C language +and specialized languages internal to @code{gcc}. + +While the @emph{code} that implements the GBE +is written in a combination of languages, +the GBE itself is, +to the front end for a language like Fortran, +best viewed as a @emph{compiler} +that compiles its own, unique, language. + +The GBE's ``source'', then, is written in this language, +which consists primarily of +a combination of calls to GBE functions +and @dfn{tree} nodes +(which are, themselves, created +by calling GBE functions). + +So, the @code{g77} generates code by, in effect, +translating the Fortran code it reads +into a form ``written'' in the ``language'' +of the @code{gcc} back end. + +@cindex GBEL +@cindex GNU Back End Language (GBEL) +This language will heretofore be referred to as @dfn{GBEL}, +for GNU Back End Language. + +GBEL is an evolving language, +not fully specified in any published form +as of this writing. +It offers many facilities, +but its ``core'' facilities +are those that corresponding most directly +to those needed to support @code{gcc} +(compiling code written in GNU C). + +The @code{g77} Fortran Front End (FFE) +is designed and implemented +to navigate the currents and eddies +of ongoing GBEL and @code{gcc} development +while also delivering on the potential +of an integrated FFE +(as compared to using a converter like @code{f2c} +and feeding the output into @code{gcc}). + +Goals of the FFE's code-generation strategy include: + +@itemize @bullet +@item +High likelihood of generation of correct code, +or, failing that, producing a fatal diagnostic or crashing. + +@item +Generation of highly optimized code, +as directed by the user +via GBE-specific (versus @code{g77}-specific) constructs, +such as command-line options. + +@item +Fast overall (FFE plus GBE) compilation. + +@item +Preservation of source-level debugging information. +@end itemize + +The strategies historically, and currently, used by the FFE +to achieve these goals include: + +@itemize @bullet +@item +Use of GBEL constructs that most faithfully encapsulate +the semantics of Fortran. + +@item +Avoidance of GBEL constructs that are so rarely used, +or limited to use in specialized situations not related to Fortran, +that their reliability and performance has not yet been established +as sufficient for use by the FFE. + +@item +Flexible design, to readily accommodate changes to specific +code-generation strategies, perhaps governed by command-line options. +@end itemize + +@cindex Bear-poking +@cindex Poking the bear +``Don't poke the bear'' somewhat summarizes the above strategies. +The GBE is the bear. +The FFE is designed and implemented to avoid poking it +in ways that are likely to just annoy it. +The FFE usually either tackles it head-on, +or avoids treating it in ways dissimilar to how +the @code{gcc} front end treats it. + +For example, the FFE uses the native array facility in the back end +instead of the lower-level pointer-arithmetic facility +used by @code{gcc} when compiling @code{f2c} output). +Theoretically, this presents more opportunities for optimization, +faster compile times, +and the production of more faithful debugging information. +These benefits were not, however, immediately realized, +mainly because @code{gcc} itself makes little or no use +of the native array facility. + +Complex arithmetic is a case study of the evolution of this strategy. +When originally implemented, +the GBEL had just evolved its own native complex-arithmetic facility, +so the FFE took advantage of that. + +When porting @code{g77} to 64-bit systems, +it was discovered that the GBE didn't really +implement its native complex-arithmetic facility properly. + +The short-term solution was to rewrite the FFE +to instead use the lower-level facilities +that'd be used by @code{gcc}-compiled code +(assuming that code, itself, didn't use the native complex type +provided, as an extension, by @code{gcc}), +since these were known to work, +and, in any case, if shown to not work, +would likely be rapidly fixed +(since they'd likely not work for vanilla C code in similar circumstances). + +However, the rewrite accommodated the original, native approach as well +by offering a command-line option to select it over the emulated approach. +This allowed users, and especially GBE maintainers, to try out +fixes to complex-arithmetic support in the GBE +while @code{g77} continued to default to compiling more code correctly, +albeit producing (typically) slower executables. + +As of April 1999, it appeared that the last few bugs +in the GBE's support of its native complex-arithmetic facility +were worked out. +The FFE was changed back to default to using that native facility, +leaving emulation as an option. + +Other Fortran constructs---arrays, character strings, +complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates, +and so on---involve issues similar to those pertaining to complex arithmetic. + +So, it is possible that the history +of how the FFE handled complex arithmetic +will be repeated, probably in modified form +(and hopefully over shorter timeframes), +for some of these other facilities. + +@node Two-pass Design +@section Two-pass Design + +The FFE does not tell the GBE anything about a program unit +until after the last statement in that unit has been parsed. +(A program unit is a Fortran concept that corresponds, in the C world, +mostly closely to functions definitions in ISO C. +That is, a program unit in Fortran is like a top-level function in C. +Nested functions, found among the extensions offered by GNU C, +correspond roughly to Fortran's statement functions.) + +So, while parsing the code in a program unit, +the FFE saves up all the information +on statements, expressions, names, and so on, +until it has seen the last statement. + +At that point, the FFE revisits the saved information +(in what amounts to a second @dfn{pass} over the program unit) +to perform the actual translation of the program unit into GBEL, +ultimating in the generation of assembly code for it. + +Some lookahead is performed during this second pass, +so the FFE could be viewed as a ``two-plus-pass'' design. + +@menu +* Two-pass Code:: +* Why Two Passes:: +@end menu + +@node Two-pass Code +@subsection Two-pass Code + +Most of the code that turns the first pass (parsing) +into a second pass for code generation +is in @file{@value{path-g77}/std.c}. + +It has external functions, +called mainly by siblings in @file{@value{path-g77}/stc.c}, +that record the information on statements and expressions +in the order they are seen in the source code. +These functions save that information. + +It also has an external function that revisits that information, +calling the siblings in @file{@value{path-g77}/ste.c}, +which handles the actual code generation +(by generating GBEL code, +that is, by calling GBE routines +to represent and specify expressions, statements, and so on). + +@node Why Two Passes +@subsection Why Two Passes + +The need for two passes was not immediately evident +during the design and implementation of the code in the FFE +that was to produce GBEL. +Only after a few kludges, +to handle things like incorrectly-guessed @code{ASSIGN} label nature, +had been implemented, +did enough evidence pile up to make it clear +that @file{std.c} had to be introduced to intercept, +save, then revisit as part of a second pass, +the digested contents of a program unit. + +Other such missteps have occurred during the evolution of the FFE, +because of the different goals of the FFE and the GBE. + +Because the GBE's original, and still primary, goal +was to directly support the GNU C language, +the GBEL, and the GBE itself, +requires more complexity +on the part of most front ends +than it requires of @code{gcc}'s. + +For example, +the GBEL offers an interface that permits the @code{gcc} front end +to implement most, or all, of the language features it supports, +without the front end having to +make use of non-user-defined variables. +(It's almost certainly the case that all of K&R C, +and probably ANSI C as well, +is handled by the @code{gcc} front end +without declaring such variables.) + +The FFE, on the other hand, must resort to a variety of ``tricks'' +to achieve its goals. + +Consider the following C code: + +@smallexample +int +foo (int a, int b) +@{ + int c = 0; + + if ((c = bar (c)) == 0) + goto done; + + quux (c << 1); + +done: + return c; +@} +@end smallexample + +Note what kinds of objects are declared, or defined, before their use, +and before any actual code generation involving them +would normally take place: + +@itemize @bullet +@item +Return type of function + +@item +Entry point(s) of function + +@item +Dummy arguments + +@item +Variables + +@item +Initial values for variables +@end itemize + +Whereas, the following items can, and do, +suddenly appear ``out of the blue'' in C: + +@itemize @bullet +@item +Label references + +@item +Function references +@end itemize + +Not surprisingly, the GBE faithfully permits the latter set of items +to be ``discovered'' partway through GBEL ``programs'', +just as they are permitted to in C. + +Yet, the GBE has tended, at least in the past, +to be reticent to fully support similar ``late'' discovery +of items in the former set. + +This makes Fortran a poor fit for the ``safe'' subset of GBEL. +Consider: + +@smallexample + FUNCTION X (A, ARRAY, ID1) + CHARACTER*(*) A + DOUBLE PRECISION X, Y, Z, TMP, EE, PI + REAL ARRAY(ID1*ID2) + COMMON ID2 + EXTERNAL FRED + + ASSIGN 100 TO J + CALL FOO (I) + IF (I .EQ. 0) PRINT *, A(0) + GOTO 200 + + ENTRY Y (Z) + ASSIGN 101 TO J +200 PRINT *, A(1) + READ *, TMP + GOTO J +100 X = TMP * EE + RETURN +101 Y = TMP * PI + CALL FRED + DATA EE, PI /2.71D0, 3.14D0/ + END +@end smallexample + +Here are some observations about the above code, +which, while somewhat contrived, +conforms to the FORTRAN 77 and Fortran 90 standards: + +@itemize @bullet +@item +The return type of function @samp{X} is not known +until the @samp{DOUBLE PRECISION} line has been parsed. + +@item +Whether @samp{A} is a function or a variable +is not known until the @samp{PRINT *, A(0)} statement +has been parsed. + +@item +The bounds of the array of argument @samp{ARRAY} +depend on a computation involving +the subsequent argument @samp{ID1} +and the blank-common member @samp{ID2}. + +@item +Whether @samp{Y} and @samp{Z} are local variables, +additional function entry points, +or dummy arguments to additional entry points +is not known +until the @code{ENTRY} statement is parsed. + +@item +Similarly, whether @samp{TMP} is a local variable is not known +until the @samp{READ *, TMP} statement is parsed. + +@item +The initial values for @samp{EE} and @samp{PI} +are not known until after the @code{DATA} statement is parsed. + +@item +Whether @samp{FRED} is a function returning type @code{REAL} +or a subroutine +(which can be thought of as returning type @code{void} +@emph{or}, to support alternate returns in a simple way, +type @code{int}) +is not known +until the @samp{CALL FRED} statement is parsed. + +@item +Whether @samp{100} is a @code{FORMAT} label +or the label of an executable statement +is not known +until the @samp{X =} statement is parsed. +(These two types of labels get @emph{very} different treatment, +especially when @code{ASSIGN}'ed.) + +@item +That @samp{J} is a local variable is not known +until the first @code{ASSIGN} statement is parsed. +(This happens @emph{after} executable code has been seen.) +@end itemize + +Very few of these ``discoveries'' +can be accommodated by the GBE as it has evolved over the years. +The GBEL doesn't support several of them, +and those it might appear to support +don't always work properly, +especially in combination with other GBEL and GBE features, +as implemented in the GBE. + +(Had the GBE and its GBEL originally evolved to support @code{g77}, +the shoe would be on the other foot, so to speak---most, if not all, +of the above would be directly supported by the GBEL, +and a few C constructs would probably not, as they are in reality, +be supported. +Both this mythical, and today's real, GBE caters to its GBEL +by, sometimes, scrambling around, cleaning up after itself---after +discovering that assumptions it made earlier during code generation +are incorrect.) + +So, the FFE handles these discrepancies---between the order in which +it discovers facts about the code it is compiling, +and the order in which the GBEL and GBE support such discoveries---by +performing what amounts to two +passes over each program unit. + +(A few ambiguities can remain at that point, +such as whether, given @samp{EXTERNAL BAZ} +and no other reference to @samp{BAZ} in the program unit, +it is a subroutine, a function, or a block-data---which, in C-speak, +governs its declared return type. +Fortunately, these distinctions are easily finessed +for the procedure, library, and object-file interfaces +supported by @code{g77}.) + +@node Challenges Posed +@section Challenges Posed + +Consider the following Fortran code, which uses various extensions +(including some to Fortran 90): + +@smallexample +SUBROUTINE X(A) +CHARACTER*(*) A +COMPLEX CFUNC +INTEGER*2 CLOCKS(200) +INTEGER IFUNC + +CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')')))) +@end smallexample + +The above poses the following challenges to any Fortran compiler +that uses run-time interfaces, and a run-time library, roughly similar +to those used by @code{g77}: + +@itemize @bullet +@item +Assuming the library routine that supports @code{SYSTEM_CLOCK} +expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument, +the compiler must make available to it a temporary variable of that type. + +@item +Further, after the @code{SYSTEM_CLOCK} library routine returns, +the compiler must ensure that the temporary variable it wrote +is copied into the appropriate element of the @samp{CLOCKS} array. +(This assumes the compiler doesn't just reject the code, +which it should if it is compiling under some kind of a ``strict'' option.) + +@item +To determine the correct index into the @samp{CLOCKS} array, +(putting aside the fact that the index, in this particular case, +need not be computed until after +the @code{SYSTEM_CLOCK} library routine returns), +the compiler must ensure that the @code{IFUNC} function is called. + +That requires evaluating its argument, +which requires, for @code{g77} +(assuming @code{-ff2c} is in force), +reserving a temporary variable of type @code{COMPLEX} +for use as a repository for the return value +being computed by @samp{CFUNC}. + +@item +Before invoking @samp{CFUNC}, +is argument must be evaluated, +which requires allocating, at run time, +a temporary large enough to hold the result of the concatenation, +as well as actually performing the concatenation. + +@item +The large temporary needed during invocation of @code{CFUNC} +should, ideally, be deallocated +(or, at least, left to the GBE to dispose of, as it sees fit) +as soon as @code{CFUNC} returns, +which means before @code{IFUNC} is called +(as it might need a lot of dynamically allocated memory). +@end itemize + +@code{g77} currently doesn't support all of the above, +but, so that it might someday, it has evolved to handle +at least some of the above requirements. + +Meeting the above requirements is made more challenging +by conforming to the requirements of the GBEL/GBE combination. + +@node Transforming Statements +@section Transforming Statements + +Most Fortran statements are given their own block, +and, for temporary variables they might need, their own scope. +(A block is what distinguishes @samp{@{ foo (); @}} +from just @samp{foo ();} in C. +A scope is included with every such block, +providing a distinct name space for local variables.) + +Label definitions for the statement precede this block, +so @samp{10 PRINT *, I} is handled more like +@samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}} +(where @samp{fl10} is just a notation meaning ``Fortran Label 10'' +for the purposes of this document). + +@menu +* Statements Needing Temporaries:: +* Transforming DO WHILE:: +* Transforming Iterative DO:: +* Transforming Block IF:: +* Transforming SELECT CASE:: +@end menu + +@node Statements Needing Temporaries +@subsection Statements Needing Temporaries + +Any temporaries needed during, but not beyond, +execution of a Fortran statement, +are made local to the scope of that statement's block. + +This allows the GBE to share storage for these temporaries +among the various statements without the FFE +having to manage that itself. + +(The GBE could, of course, decide to optimize +management of these temporaries. +For example, it could, theoretically, +schedule some of the computations involving these temporaries +to occur in parallel. +More practically, it might leave the storage for some temporaries +``live'' beyond their scopes, to reduce the number of +manipulations of the stack pointer at run time.) + +Temporaries needed across distinct statement boundaries usually +are associated with Fortran blocks (such as @code{DO}/@code{END DO}). +(Also, there might be temporaries not associated with blocks at all---these +would be in the scope of the entire program unit.) + +Each Fortran block @emph{should} get its own block/scope in the GBE. +This is best, because it allows temporaries to be more naturally handled. +However, it might pose problems when handling labels +(in particular, when they're the targets of @code{GOTO}s outside the Fortran +block), and generally just hassling with replicating +parts of the @code{gcc} front end +(because the FFE needs to support +an arbitrary number of nested back-end blocks +if each Fortran block gets one). + +So, there might still be a need for top-level temporaries, whose +``owning'' scope is that of the containing procedure. + +Also, there seems to be problems declaring new variables after +generating code (within a block) in the back end, leading to, e.g., +@samp{label not defined before binding contour} or similar messages, +when compiling with @samp{-fstack-check} or +when compiling for certain targets. + +Because of that, and because sometimes these temporaries are not +discovered until in the middle of of generating code for an expression +statement (as in the case of the optimization for @samp{X**I}), +it seems best to always +pre-scan all the expressions that'll be expanded for a block +before generating any of the code for that block. + +This pre-scan then handles discovering and declaring, to the back end, +the temporaries needed for that block. + +It's also important to treat distinct items in an I/O list as distinct +statements deserving their own blocks. +That's because there's a requirement +that each I/O item be fully processed before the next one, +which matters in cases like @samp{READ (*,*), I, A(I)}---the +element of @samp{A} read in the second item +@emph{must} be determined from the value +of @samp{I} read in the first item. + +@node Transforming DO WHILE +@subsection Transforming DO WHILE + +@samp{DO WHILE(expr)} @emph{must} be implemented +so that temporaries needed to evaluate @samp{expr} +are generated just for the test, each time. + +Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed: + +@smallexample +for (;;) + @{ + int temp0; + + @{ + char temp1[large]; + + libg77_catenate (temp1, a, b); + temp0 = libg77_ne (temp1, 'END'); + @} + + if (! temp0) + break; + + @dots{} + @} +@end smallexample + +In this case, it seems like a time/space tradeoff +between allocating and deallocating @samp{temp1} for each iteration +and allocating it just once for the entire loop. + +However, if @samp{temp1} is allocated just once for the entire loop, +it could be the wrong size for subsequent iterations of that loop +in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')}, +because the body of the loop might modify @samp{I} or @samp{J}. + +So, the above implementation is used, +though a more optimal one can be used +in specific circumstances. + +@node Transforming Iterative DO +@subsection Transforming Iterative DO + +An iterative @code{DO} loop +(one that specifies an iteration variable) +is required by the Fortran standards +to be implemented as though an iteration count +is computed before entering the loop body, +and that iteration count used to determine +the number of times the loop body is to be performed +(assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}). + +The FFE handles this by allocating a temporary variable +to contain the computed number of iterations. +Since this variable must be in a scope that includes the entire loop, +a GBEL block is created for that loop, +and the variable declared as belonging to the scope of that block. + +@node Transforming Block IF +@subsection Transforming Block IF + +Consider: + +@smallexample +SUBROUTINE X(A,B,C) +CHARACTER*(*) A, B, C +LOGICAL LFUNC + +IF (LFUNC (A//B)) THEN + CALL SUBR1 +ELSE IF (LFUNC (A//C)) THEN + CALL SUBR2 +ELSE + CALL SUBR3 +END +@end smallexample + +The arguments to the two calls to @samp{LFUNC} +require dynamic allocation (at run time), +but are not required during execution of the @code{CALL} statements. + +So, the scopes of those temporaries must be within blocks inside +the block corresponding to the Fortran @code{IF} block. + +This cannot be represented ``naturally'' +in vanilla C, nor in GBEL. +The @code{if}, @code{elseif}, @code{else}, +and @code{endif} constructs +provided by both languages must, +for a given @code{if} block, +share the same C/GBE block. + +Therefore, any temporaries needed during evaluation of @samp{expr} +while executing @samp{ELSE IF(expr)} +must either have been predeclared +at the top of the corresponding @code{IF} block, +or declared within a new block for that @code{ELSE IF}---a block that, +since it cannot contain the @code{else} or @code{else if} itself +(due to the above requirement), +actually implements the rest of the @code{IF} block's +@code{ELSE IF} and @code{ELSE} statements +within an inner block. + +The FFE takes the latter approach. + +@node Transforming SELECT CASE +@subsection Transforming SELECT CASE + +@code{SELECT CASE} poses a few interesting problems for code generation, +if efficiency and frugal stack management are important. + +Consider @samp{SELECT CASE (I('PREFIX'//A))}, +where @samp{A} is @code{CHARACTER*(*)}. +In a case like this---basically, +in any case where largish temporaries are needed +to evaluate the expression---those temporaries should +not be ``live'' during execution of any of the @code{CASE} blocks. + +So, evaluation of the expression is best done within its own block, +which in turn is within the @code{SELECT CASE} block itself +(which contains the code for the CASE blocks as well, +though each within their own block). + +Otherwise, we'd have the rough equivalent of this pseudo-code: + +@smallexample +@{ + char temp[large]; + + libg77_catenate (temp, 'prefix', a); + + switch (i (temp)) + @{ + case 0: + @dots{} + @} +@} +@end smallexample + +And that would leave temp[large] in scope during the CASE blocks +(although a clever back end *could* see that it isn't referenced +in them, and thus free that temp before executing the blocks). + +So this approach is used instead: + +@smallexample +@{ + int temp0; + + @{ + char temp1[large]; + + libg77_catenate (temp1, 'prefix', a); + temp0 = i (temp1); + @} + + switch (temp0) + @{ + case 0: + @dots{} + @} +@} +@end smallexample + +Note how @samp{temp1} goes out of scope before starting the switch, +thus making it easy for a back end to free it. + +The problem @emph{that} solution has, however, +is with @samp{SELECT CASE('prefix'//A)} +(which is currently not supported). + +Unless the GBEL is extended to support arbitrarily long character strings +in its @code{case} facility, +the FFE has to implement @code{SELECT CASE} on @code{CHARACTER} +(probably excepting @code{CHARACTER*1}) +using a cascade of +@code{if}, @code{elseif}, @code{else}, and @code{endif} constructs +in GBEL. + +To prevent the (potentially large) temporary, +needed to hold the selected expression itself (@samp{'prefix'//A}), +from being in scope during execution of the @code{CASE} blocks, +two approaches are available: + +@itemize @bullet +@item +Pre-evaluate all the @code{CASE} tests, +producing an integer ordinal that is used, +a la @samp{temp0} in the earlier example, +as if @samp{SELECT CASE(temp0)} had been written. + +Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})}, +where @var{i} is the ordinal for that case, +determined while, or before, +generating the cascade of @code{if}-related constructs +to cope with @code{CHARACTER} selection. + +@item +Make @samp{temp0} above just +large enough to hold the longest @code{CASE} string +that'll actually be compared against the expression +(in this case, @samp{'prefix'//A}). + +Since that length must be constant +(because @code{CASE} expressions are all constant), +it won't be so large, +and, further, @samp{temp1} need not be dynamically allocated, +since normal @code{CHARACTER} assignment can be used +into the fixed-length @samp{temp0}. +@end itemize + +Both of these solutions require @code{SELECT CASE} implementation +to be changed so all the corresponding @code{CASE} statements +are seen during the actual code generation for @code{SELECT CASE}. + +@node Transforming Expressions +@section Transforming Expressions + +The interactions between statements, expressions, and subexpressions +at program run time can be viewed as: + +@smallexample +@var{action}(@var{expr}) +@end smallexample + +Here, @var{action} is the series of steps +performed to effect the statement, +and @var{expr} is the expression +whose value is used by @var{action}. + +Expanding the above shows a typical order of events at run time: + +@smallexample +Evaluate @var{expr} +Perform @var{action}, using result of evaluation of @var{expr} +Clean up after evaluating @var{expr} +@end smallexample + +So, if evaluating @var{expr} requires allocating memory, +that memory can be freed before performing @var{action} +only if it is not needed to hold the result of evaluating @var{expr}. +Otherwise, it must be freed no sooner than +after @var{action} has been performed. + +The above are recursive definitions, +in the sense that they apply to subexpressions of @var{expr}. + +That is, evaluating @var{expr} involves +evaluating all of its subexpressions, +performing the @var{action} that computes the +result value of @var{expr}, +then cleaning up after evaluating those subexpressions. + +The recursive nature of this evaluation is implemented +via recursive-descent transformation of the top-level statements, +their expressions, @emph{their} subexpressions, and so on. + +However, that recursive-descent transformation is, +due to the nature of the GBEL, +focused primarily on generating a @emph{single} stream of code +to be executed at run time. + +Yet, from the above, it's clear that multiple streams of code +must effectively be simultaneously generated +during the recursive-descent analysis of statements. + +The primary stream implements the primary @var{action} items, +while at least two other streams implement +the evaluation and clean-up items. + +Requirements imposed by expressions include: + +@itemize @bullet +@item +Whether the caller needs to have a temporary ready +to hold the value of the expression. + +@item +Other stuff??? +@end itemize + +@node Internal Naming Conventions +@section Internal Naming Conventions + +Names exported by FFE modules have the following (regular-expression) forms. +Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}}, +where @var{mod} is lowercase or uppercase alphanumerics, respectively, +are exported by the module @code{ffe@var{mod}}, +with the source code doing the exporting in @file{@var{mod}.h}. +(Usually, the source code for the implementation is in @file{@var{mod}.c}.) + +Identifiers that don't fit the following forms +are not considered exported, +even if they are according to the C language. +(For example, they might be made available to other modules +solely for use within expansions of exported macros, +not for use within any source code in those other modules.) + +@table @code +@item ffe@var{mod} +The single typedef exported by the module. + +@item FFE@var{umod}_[A-Z][A-Z0-9_]* +(Where @var{umod} is the uppercase for of @var{mod}.) + +A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}. + +@item ffe@var{mod}[A-Z][A-Z][a-z0-9]* +A typedef exported by the module. + +The portion of the identifier after @code{ffe@var{mod}} is +referred to as @code{ctype}, a capitalized (mixed-case) form +of @code{type}. + +@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]? +(Where @var{umod} is the uppercase for of @var{mod}.) + +A @code{#define} or @code{enum} constant of the type +@code{ffe@var{mod}@var{type}}, +where @var{type} is the lowercase form of @var{ctype} +in an exported typedef. + +@item ffe@var{mod}_@var{value} +A function that does or returns something, +as described by @var{value} (see below). + +@item ffe@var{mod}_@var{value}_@var{input} +A function that does or returns something based +primarily on the thing described by @var{input} (see below). +@end table + +Below are names used for @var{value} and @var{input}, +along with their definitions. + +@table @code +@item col +A column number within a line (first column is number 1). + +@item file +An encapsulation of a file's name. + +@item find +Looks up an instance of some type that matches specified criteria, +and returns that, even if it has to create a new instance or +crash trying to find it (as appropriate). + +@item initialize +Initializes, usually a module. No type. + +@item int +A generic integer of type @code{int}. + +@item is +A generic integer that contains a true (non-zero) or false (zero) value. + +@item len +A generic integer that contains the length of something. + +@item line +A line number within a source file, +or a global line number. + +@item lookup +Looks up an instance of some type that matches specified criteria, +and returns that, or returns nil. + +@item name +A @code{text} that points to a name of something. + +@item new +Makes a new instance of the indicated type. +Might return an existing one if appropriate---if so, +similar to @code{find} without crashing. + +@item pt +Pointer to a particular character (line, column pairs) +in the input file (source code being compiled). + +@item run +Performs some herculean task. No type. + +@item terminate +Terminates, usually a module. No type. + +@item text +A @code{char *} that points to generic text. +@end table |