1932aae77SSourabh Singh Tomar<!--===- docs/ArrayComposition.md
2932aae77SSourabh Singh Tomar
3932aae77SSourabh Singh Tomar   Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4932aae77SSourabh Singh Tomar   See https://llvm.org/LICENSE.txt for license information.
5932aae77SSourabh Singh Tomar   SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6932aae77SSourabh Singh Tomar
7932aae77SSourabh Singh Tomar-->
8932aae77SSourabh Singh Tomar
9*271a7bb1SRichard Barton# Array Composition
10*271a7bb1SRichard Barton
11*271a7bb1SRichard Barton```eval_rst
12*271a7bb1SRichard Barton.. contents::
13*271a7bb1SRichard Barton   :local:
14*271a7bb1SRichard Barton```
15*271a7bb1SRichard Barton
16eaff2004Ssameeran joshiThis note attempts to describe the motivation for and design of an
17eaff2004Ssameeran joshiimplementation of Fortran 90 (and later) array expression evaluation that
18eaff2004Ssameeran joshiminimizes the use of dynamically allocated temporary storage for
19eaff2004Ssameeran joshithe results of calls to transformational intrinsic functions, and
20eaff2004Ssameeran joshimaking them more amenable to acceleration.
21eaff2004Ssameeran joshi
22eaff2004Ssameeran joshiThe transformational intrinsic functions of Fortran of interest to
23eaff2004Ssameeran joshius here include:
24eaff2004Ssameeran joshi
25eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`,
26eaff2004Ssameeran joshi  `DOT_PRODUCT`,
27eaff2004Ssameeran joshi  `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`)
28eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=)`, &c.)
29eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`)
30eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(DIM=`, &c.)
31eaff2004Ssameeran joshi* `TRANSPOSE(M)` matrix transposition
32eaff2004Ssameeran joshi* `RESHAPE` without `ORDER=`
33eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=`
34eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=`
35eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=`
36eaff2004Ssameeran joshi* `PACK` and `UNPACK`
37eaff2004Ssameeran joshi* `MATMUL`
38eaff2004Ssameeran joshi* `SPREAD`
39eaff2004Ssameeran joshi
40eaff2004Ssameeran joshiOther Fortran intrinsic functions are technically transformational (e.g.,
41eaff2004Ssameeran joshi`COMMAND_ARGUMENT_COUNT`) but not of interest for this note.
42eaff2004Ssameeran joshiThe generic `REDUCE` is also not considered here.
43eaff2004Ssameeran joshi
44*271a7bb1SRichard Barton## Arrays as functions
45*271a7bb1SRichard Barton
46eaff2004Ssameeran joshiA whole array can be viewed as a function that maps its indices to the values
47eaff2004Ssameeran joshiof its elements.
48eaff2004Ssameeran joshiSpecifically, it is a map from a tuple of integers to its element type.
49eaff2004Ssameeran joshiThe rank of the array is the number of elements in that tuple,
50eaff2004Ssameeran joshiand the shape of the array delimits the domain of the map.
51eaff2004Ssameeran joshi
52eaff2004Ssameeran joshi`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers
53eaff2004Ssameeran joshi`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values.
54eaff2004Ssameeran joshi
55*271a7bb1SRichard Barton## Array expressions as functions
56*271a7bb1SRichard Barton
57eaff2004Ssameeran joshiThe same perspective can be taken of an array expression comprising
58eaff2004Ssameeran joshiintrinsic operators and elemental functions.
59eaff2004Ssameeran joshiFortran doesn't allow one to apply subscripts directly to an expression,
60eaff2004Ssameeran joshibut expressions have rank and shape, and one can view array expressions
61eaff2004Ssameeran joshias functions over index tuples by applying those indices to the arrays
62eaff2004Ssameeran joshiand subexpressions in the expression.
63eaff2004Ssameeran joshi
64eaff2004Ssameeran joshiConsider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`).
65eaff2004Ssameeran joshiThe right-hand side of that assignment could be evaluated into a
66eaff2004Ssameeran joshitemporary array `T` and then subscripted as it is copied into `B`.
67eaff2004Ssameeran joshi```
68eaff2004Ssameeran joshiREAL, ALLOCATABLE :: T(:,:)
69eaff2004Ssameeran joshiALLOCATE(T(N,M))
70eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M)
71eaff2004Ssameeran joshi  T(J,K)=A(J,K) + 1.0
72eaff2004Ssameeran joshiEND DO
73eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M)
74eaff2004Ssameeran joshi  B(J,K)=T(J,K)
75eaff2004Ssameeran joshiEND DO
76eaff2004Ssameeran joshiDEALLOCATE(T)
77eaff2004Ssameeran joshi```
78eaff2004Ssameeran joshiBut we can avoid the allocation, population, and deallocation of
79eaff2004Ssameeran joshithe temporary by treating the right-hand side expression as if it
80eaff2004Ssameeran joshiwere a statement function `F(J,K)=A(J,K)+1.0` and evaluating
81eaff2004Ssameeran joshi```
82eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M)
83eaff2004Ssameeran joshi  A(J,K)=F(J,K)
84eaff2004Ssameeran joshiEND DO
85eaff2004Ssameeran joshi```
86eaff2004Ssameeran joshi
87eaff2004Ssameeran joshiIn general, when a Fortran array assignment to a non-allocatable array
88eaff2004Ssameeran joshidoes not include the left-hand
89eaff2004Ssameeran joshiside variable as an operand of the right-hand side expression, and any
90eaff2004Ssameeran joshifunction calls on the right-hand side are elemental or scalar-valued,
91eaff2004Ssameeran joshiwe can avoid the use of a temporary.
92eaff2004Ssameeran joshi
93*271a7bb1SRichard Barton## Transformational intrinsic functions as function composition
94*271a7bb1SRichard Barton
95eaff2004Ssameeran joshiMany of the transformational intrinsic functions listed above
96eaff2004Ssameeran joshican, when their array arguments are viewed as functions over their
97eaff2004Ssameeran joshiindex tuples, be seen as compositions of those functions with
98eaff2004Ssameeran joshifunctions of the "incoming" indices -- yielding a function for
99eaff2004Ssameeran joshian entire right-hand side of an array assignment statement.
100eaff2004Ssameeran joshi
101eaff2004Ssameeran joshiFor example, the application of `TRANSPOSE(A + 1.0)` to the index
102eaff2004Ssameeran joshituple `(J,K)` becomes `A(K,J) + 1.0`.
103eaff2004Ssameeran joshi
104eaff2004Ssameeran joshiPartial (axial) reductions can be similarly composed.
105eaff2004Ssameeran joshiThe application of `SUM(A,DIM=2)` to the index `J` is the
106eaff2004Ssameeran joshicomplete reduction `SUM(A(J,:))`.
107eaff2004Ssameeran joshi
108eaff2004Ssameeran joshiMore completely:
109eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)` without `DIM=`) become
110eaff2004Ssameeran joshi  runtime calls; the result needs no dynamic allocation,
111eaff2004Ssameeran joshi  being a scalar.
112eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)`
113eaff2004Ssameeran joshi  become scalar values like `SUM(X(J,K,:))` if `d=3`.
114eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC(X)` without `DIM=`)
115eaff2004Ssameeran joshi  do not require dynamic allocation, since their results are
116eaff2004Ssameeran joshi  either scalar or small vectors of length `RANK(X)`.
117eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(X,DIM=)`, &c.)
118eaff2004Ssameeran joshi  are handled like other axial reductions like `SUM(DIM=)`.
119eaff2004Ssameeran joshi* `TRANSPOSE(M)` exchanges the two components of the index tuple.
120eaff2004Ssameeran joshi* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape
121eaff2004Ssameeran joshi  vector `S`, and then use it to linearize indices into offsets
122eaff2004Ssameeran joshi  in the storage order of `A` (whose shape must also be captured).
123eaff2004Ssameeran joshi  These conversions can involve division and/or modulus, which
124eaff2004Ssameeran joshi  can be optimized into a fixed-point multiplication using the
125eaff2004Ssameeran joshi  usual technique.
126eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=` is similar, but must permute the
127eaff2004Ssameeran joshi  components of the index tuple; it generalizes `TRANSPOSE`.
128eaff2004Ssameeran joshi* `CSHIFT` applies addition and modulus.
129eaff2004Ssameeran joshi* `EOSHIFT` applies addition and a conditional move (`MERGE`).
130eaff2004Ssameeran joshi* `PACK` and `UNPACK` are likely to require a runtime call.
131eaff2004Ssameeran joshi* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but
132eaff2004Ssameeran joshi  might benefit from calling a highly optimized runtime
133eaff2004Ssameeran joshi  routine.
134eaff2004Ssameeran joshi* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply
135eaff2004Ssameeran joshi  applies `A` to a reduced index tuple.
136eaff2004Ssameeran joshi
137*271a7bb1SRichard Barton## Determination of rank and shape
138*271a7bb1SRichard Barton
139eaff2004Ssameeran joshiAn important part of evaluating array expressions without the use of
140eaff2004Ssameeran joshitemporary storage is determining the shape of the result prior to,
141eaff2004Ssameeran joshior without, evaluating the elements of the result.
142eaff2004Ssameeran joshi
143eaff2004Ssameeran joshiThe shapes of array objects, results of elemental intrinsic functions,
144eaff2004Ssameeran joshiand results of intrinsic operations are obvious.
145eaff2004Ssameeran joshiBut it is possible to determine the shapes of the results of many
146eaff2004Ssameeran joshitransformational intrinsic function calls as well.
147eaff2004Ssameeran joshi
148eaff2004Ssameeran joshi* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed:
149eaff2004Ssameeran joshi  `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general.
150eaff2004Ssameeran joshi  (The `DIM=` argument is commonly a compile-time constant.)
151eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X))` is `[RANK(X)]`.
152eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed.
153eaff2004Ssameeran joshi* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`.
154eaff2004Ssameeran joshi* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`.
155eaff2004Ssameeran joshi* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`.
156eaff2004Ssameeran joshi* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)`
157eaff2004Ssameeran joshi* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`.
158eaff2004Ssameeran joshi* `RANK(PACK(...))` is always 1.
159eaff2004Ssameeran joshi* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`.
160eaff2004Ssameeran joshi* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`.
161eaff2004Ssameeran joshi* `SHAPE(SHAPE(X))` is `[RANK(X)]`.
162eaff2004Ssameeran joshi* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at
163eaff2004Ssameeran joshi  dimension `d`.
164eaff2004Ssameeran joshi
165eaff2004Ssameeran joshiThis is useful because expression evaluations that *do* require temporaries
166eaff2004Ssameeran joshito hold their results (due to the context in which the evaluation occurs)
167eaff2004Ssameeran joshican be implemented with a separation of the allocation
168eaff2004Ssameeran joshiof the temporary array and the population of the array.
169eaff2004Ssameeran joshiThe code that evaluates the expression, or that implements a transformational
170eaff2004Ssameeran joshiintrinsic in the runtime library, can be designed with an API that includes
171eaff2004Ssameeran joshia pointer to the destination array as an argument.
172eaff2004Ssameeran joshi
173eaff2004Ssameeran joshiStatements like `ALLOCATE(A,SOURCE=expression)` should thus be capable
174eaff2004Ssameeran joshiof evaluating their array expressions directly into the newly-allocated
175eaff2004Ssameeran joshistorage for the allocatable array.
176eaff2004Ssameeran joshiThe implementation would generate code to calculate the shape, use it
177eaff2004Ssameeran joshito allocate the memory and populate the descriptor, and then drive a
178eaff2004Ssameeran joshiloop nest around the expression to populate the array.
179eaff2004Ssameeran joshiIn cases where the analyzed shape is known at compile time, we should
180eaff2004Ssameeran joshibe able to have the opportunity to avoid heap allocation in favor of
181eaff2004Ssameeran joshistack storage, if the scope of the variable is local.
182eaff2004Ssameeran joshi
183*271a7bb1SRichard Barton## Automatic reallocation of allocatables
184*271a7bb1SRichard Barton
185eaff2004Ssameeran joshiFortran 2003 introduced the ability to assign non-conforming array expressions
186eaff2004Ssameeran joshito ALLOCATABLE arrays with the implied semantics of reallocation to the
187eaff2004Ssameeran joshinew shape.
188eaff2004Ssameeran joshiThe implementation of this feature also becomes more straightforward if
189eaff2004Ssameeran joshiour implementation of array expressions has decoupled calculation of shapes
190eaff2004Ssameeran joshifrom the evaluation of the elements of the result.
191eaff2004Ssameeran joshi
192*271a7bb1SRichard Barton## Rewriting rules
193*271a7bb1SRichard Barton
194eaff2004Ssameeran joshiLet `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into
195eaff2004Ssameeran joshithe result of an array expression or subexpression.
196eaff2004Ssameeran joshi
197eaff2004Ssameeran joshi* Array constructors always yield vectors; higher-rank arrays that appear as
198eaff2004Ssameeran joshi  constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`.
199eaff2004Ssameeran joshi* Array constructors with multiple constituents are concatenations of
200eaff2004Ssameeran joshi  their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`.
201eaff2004Ssameeran joshi* Array constructors with implied DO loops are difficult when nested
202eaff2004Ssameeran joshi  triangularly.
203eaff2004Ssameeran joshi* Whole array references can have lower bounds other than 1, so
204eaff2004Ssameeran joshi  `A => A(LBOUND(A,1):UBOUND(A,1),...)`.
205eaff2004Ssameeran joshi* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`.
206eaff2004Ssameeran joshi* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`.
207eaff2004Ssameeran joshi* Scalar operands ignore indices: `X{j,k} => X`.
208eaff2004Ssameeran joshi  Further, they are evaluated at most once.
209eaff2004Ssameeran joshi* Elemental operators and functions apply indices to their arguments:
210eaff2004Ssameeran joshi  `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`.
211eaff2004Ssameeran joshi* `TRANSPOSE(X){j,k} => X{k,j}`.
212eaff2004Ssameeran joshi* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated.
213eaff2004Ssameeran joshi  If X is sufficiently expensive to compute elementally, it might be evaluated
214eaff2004Ssameeran joshi  into a temporary.
215eaff2004Ssameeran joshi
216eaff2004Ssameeran joshi(more...)
217