1<!--===- docs/ControlFlowGraph.md
2
3   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4   See https://llvm.org/LICENSE.txt for license information.
5   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6
7-->
8
9## Concept
10After a Fortran subprogram has been parsed, its names resolved, and all its
11semantic constraints successfully checked, the parse tree of its
12executable part is translated into another abstract representation,
13namely the _control flow graph_ described in this note.
14
15This second representation of the subprogram's executable part is
16suitable for analysis and incremental modification as the subprogram
17is readied for code generation.
18Many high-level Fortran features are implemented by rewriting portions
19of a subprogram's control flow graph in place.
20
21### Control Flow Graph
22A _control flow graph_ is a collection of simple (_i.e.,_ "non-extended")
23basic _blocks_ that comprise straight-line sequences of _actions_ with a
24single entry point and a single exit point, and a collection of
25directed flow _edges_ (or _arcs_) denoting all possible transitions of
26control flow that may take place during execution from the end of
27one basic block to the beginning of another (or itself).
28
29A block that has multiple distinct successors in the flow of control
30must end with an action that selects its successor.
31
32The sequence of actions that constitutes a basic block may
33include references to user and library procedures.
34Subprogram calls with implicit control flow afterwards, namely
35alternate returns and `END=`/`ERR=` labels on input/output,
36will be lowered in translation to a representation that materializes
37that control flow into something similar to a computed `GO TO` or
38C language `switch` statement.
39
40For convenience in optimization and to simplify the implementation of
41data flow confluence functions, we may choose to maintain the
42property that each flow arc is the sole outbound arc emanating from
43its originating block, the sole inbound arc arriving at its destination,
44or both.
45Empty blocks would inserted to "split" arcs when necessary to maintain this
46invariant property.
47
48Fortran subprograms (other than internal subprograms) can have multiple
49entry points by using the obsolescent `ENTRY` statement.
50We will implement such subprograms by constructing a union
51of their dummy argument lists and using it as part of the definition
52of a new subroutine or function that can be called by each of
53the entry points, which are then all converted into wrapper routines that
54pass a selector value as an additional argument to drive a `switch` on entry
55to the new subprogram.
56
57This transformation ensures that every subprogram's control
58flow graph has a well-defined `START` node.
59
60Statement labels can be used in Fortran on any statement, but only
61the labels that decorate legal destinations of `GO TO` statements
62need to be implemented in the control flow graph.
63Specifically, non-executable statements like `DATA`, `NAMELIST`, and
64`FORMAT` statements will be extracted into data initialization
65records before or during the construction of the control flow
66graph, and will survive only as synonyms for `CONTINUE`.
67
68Nests of multiple labeled `DO` loops that terminate on the same
69label will be have that label rewritten so that `GO TO` within
70the loop nest will arrive at the copy that most closely nests
71the context.
72The Fortran standard does not require us to do this, but XLF
73(at least) works this way.
74
75### Expressions and Statements (Operations and Actions)
76Expressions are trees, not DAGs, of intrinsic operations,
77resolved function references, constant literals, and
78data designators.
79
80Expression nodes are represented in the compiler in a type-safe manner.
81There is a distinct class or class template for every category of
82intrinsic type, templatized over its supported kind type parameter values.
83
84Operands are storage-owning indirections to other instances
85of `Expression`, instances of constant values, and to representations
86of data and function references.
87These indirections are not nullable apart from the situation in which
88the operands of an expression are being removed for use elsewhere before
89the expression is destructed.
90
91The ranks and the extents of the shapes of the results of expressions
92are explicit for constant arrays and recoverable by analysis otherwise.
93
94Parenthesized subexpressions are scrupulously preserved in accordance with
95the Fortran standard.
96
97The expression tree is meant to be a representation that is
98as equally well suited for use in the symbol table (e.g., for
99a bound of an explicit shape array) as it is for an action
100in a basic block of the control flow graph (e.g., the right
101hand side of an assignment statement).
102
103Each basic block comprises a linear sequence of _actions_.
104These are represented as a doubly-linked list so that insertion
105and deletion can be done in constant time.
106
107Only the last action in a basic block can represent a change
108to the flow of control.
109
110### Scope Transitions
111Some of the various scopes of the symbol table are visible in the control flow
112graph as `SCOPE ENTRY` and `SCOPE EXIT` actions.
113`SCOPE ENTRY` actions are unique for their corresponding scopes,
114while `SCOPE EXIT` actions need not be so.
115It must be the case that
116any flow of control within the subprogram will enter only scopes that are
117not yet active, and exit only the most recently entered scope that has not
118yet been deactivated; i.e., when modeled by a push-down stack that is
119pushed by each traversal of a `SCOPE ENTRY` action,
120the entries of the stack are always distinct, only the scope at
121the top of the stack is ever popped by `SCOPE EXIT`, and the stack is empty
122when the subprogram terminates.
123Further, any references to resolved symbols must be to symbols whose scopes
124are active.
125
126The `DEALLOCATE` actions and calls to `FINAL` procedures implied by scoped
127lifetimes will be explicit in the sequence of actions in the control flow
128graph.
129
130Parallel regions might be partially represented by scopes, or by explicit
131operations similar to the scope entry and exit operations.
132
133### Data Flow Representation
134The subprogram text will be in static single assignment form by the time the
135subprogram arrives at the bridge to the LLVM IR builder.
136Merge points are actions at the heads of basic blocks whose operands
137are definition points; definition points are actions at the ends of
138basic blocks whose operands are expression trees (which may refer to
139merge points).
140
141### Rewriting Transformations
142
143#### I/O
144#### Dynamic allocation
145#### Array constructors
146
147#### Derived type initialization, deallocation, and finalization
148The machinery behind the complicated semantics of Fortran's derived types
149and `ALLOCATABLE` objects will be implemented in large part by the run time
150support library.
151
152#### Actual argument temporaries
153#### Array assignments, `WHERE`, and `FORALL`
154
155Array operations have shape.
156
157`WHERE` masks have shape.
158Their effects on array operations are by means of explicit `MASK` operands that
159are part of array assignment operations.
160
161#### Intrinsic function and subroutine calls
162