1<!--===- docs/ArrayComposition.md 2 3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4 See https://llvm.org/LICENSE.txt for license information. 5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6 7--> 8 9This note attempts to describe the motivation for and design of an 10implementation of Fortran 90 (and later) array expression evaluation that 11minimizes the use of dynamically allocated temporary storage for 12the results of calls to transformational intrinsic functions, and 13making them more amenable to acceleration. 14 15The transformational intrinsic functions of Fortran of interest to 16us here include: 17 18* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, 19 `DOT_PRODUCT`, 20 `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) 21* Axial reductions (`SUM(X,DIM=)`, &c.) 22* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) 23* Axial location reductions (`MAXLOC(DIM=`, &c.) 24* `TRANSPOSE(M)` matrix transposition 25* `RESHAPE` without `ORDER=` 26* `RESHAPE` with `ORDER=` 27* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` 28* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` 29* `PACK` and `UNPACK` 30* `MATMUL` 31* `SPREAD` 32 33Other Fortran intrinsic functions are technically transformational (e.g., 34`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. 35The generic `REDUCE` is also not considered here. 36 37Arrays as functions 38=================== 39A whole array can be viewed as a function that maps its indices to the values 40of its elements. 41Specifically, it is a map from a tuple of integers to its element type. 42The rank of the array is the number of elements in that tuple, 43and the shape of the array delimits the domain of the map. 44 45`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers 46`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. 47 48Array expressions as functions 49============================== 50The same perspective can be taken of an array expression comprising 51intrinsic operators and elemental functions. 52Fortran doesn't allow one to apply subscripts directly to an expression, 53but expressions have rank and shape, and one can view array expressions 54as functions over index tuples by applying those indices to the arrays 55and subexpressions in the expression. 56 57Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). 58The right-hand side of that assignment could be evaluated into a 59temporary array `T` and then subscripted as it is copied into `B`. 60``` 61REAL, ALLOCATABLE :: T(:,:) 62ALLOCATE(T(N,M)) 63DO CONCURRENT(J=1:N,K=1:M) 64 T(J,K)=A(J,K) + 1.0 65END DO 66DO CONCURRENT(J=1:N,K=1:M) 67 B(J,K)=T(J,K) 68END DO 69DEALLOCATE(T) 70``` 71But we can avoid the allocation, population, and deallocation of 72the temporary by treating the right-hand side expression as if it 73were a statement function `F(J,K)=A(J,K)+1.0` and evaluating 74``` 75DO CONCURRENT(J=1:N,K=1:M) 76 A(J,K)=F(J,K) 77END DO 78``` 79 80In general, when a Fortran array assignment to a non-allocatable array 81does not include the left-hand 82side variable as an operand of the right-hand side expression, and any 83function calls on the right-hand side are elemental or scalar-valued, 84we can avoid the use of a temporary. 85 86Transformational intrinsic functions as function composition 87============================================================ 88Many of the transformational intrinsic functions listed above 89can, when their array arguments are viewed as functions over their 90index tuples, be seen as compositions of those functions with 91functions of the "incoming" indices -- yielding a function for 92an entire right-hand side of an array assignment statement. 93 94For example, the application of `TRANSPOSE(A + 1.0)` to the index 95tuple `(J,K)` becomes `A(K,J) + 1.0`. 96 97Partial (axial) reductions can be similarly composed. 98The application of `SUM(A,DIM=2)` to the index `J` is the 99complete reduction `SUM(A(J,:))`. 100 101More completely: 102* Reductions to scalars (`SUM(X)` without `DIM=`) become 103 runtime calls; the result needs no dynamic allocation, 104 being a scalar. 105* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` 106 become scalar values like `SUM(X(J,K,:))` if `d=3`. 107* Location reductions to indices (`MAXLOC(X)` without `DIM=`) 108 do not require dynamic allocation, since their results are 109 either scalar or small vectors of length `RANK(X)`. 110* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) 111 are handled like other axial reductions like `SUM(DIM=)`. 112* `TRANSPOSE(M)` exchanges the two components of the index tuple. 113* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape 114 vector `S`, and then use it to linearize indices into offsets 115 in the storage order of `A` (whose shape must also be captured). 116 These conversions can involve division and/or modulus, which 117 can be optimized into a fixed-point multiplication using the 118 usual technique. 119* `RESHAPE` with `ORDER=` is similar, but must permute the 120 components of the index tuple; it generalizes `TRANSPOSE`. 121* `CSHIFT` applies addition and modulus. 122* `EOSHIFT` applies addition and a conditional move (`MERGE`). 123* `PACK` and `UNPACK` are likely to require a runtime call. 124* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but 125 might benefit from calling a highly optimized runtime 126 routine. 127* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply 128 applies `A` to a reduced index tuple. 129 130Determination of rank and shape 131=============================== 132An important part of evaluating array expressions without the use of 133temporary storage is determining the shape of the result prior to, 134or without, evaluating the elements of the result. 135 136The shapes of array objects, results of elemental intrinsic functions, 137and results of intrinsic operations are obvious. 138But it is possible to determine the shapes of the results of many 139transformational intrinsic function calls as well. 140 141* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: 142 `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. 143 (The `DIM=` argument is commonly a compile-time constant.) 144* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. 145* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. 146* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. 147* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. 148* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. 149* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` 150* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. 151* `RANK(PACK(...))` is always 1. 152* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. 153* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. 154* `SHAPE(SHAPE(X))` is `[RANK(X)]`. 155* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at 156 dimension `d`. 157 158This is useful because expression evaluations that *do* require temporaries 159to hold their results (due to the context in which the evaluation occurs) 160can be implemented with a separation of the allocation 161of the temporary array and the population of the array. 162The code that evaluates the expression, or that implements a transformational 163intrinsic in the runtime library, can be designed with an API that includes 164a pointer to the destination array as an argument. 165 166Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable 167of evaluating their array expressions directly into the newly-allocated 168storage for the allocatable array. 169The implementation would generate code to calculate the shape, use it 170to allocate the memory and populate the descriptor, and then drive a 171loop nest around the expression to populate the array. 172In cases where the analyzed shape is known at compile time, we should 173be able to have the opportunity to avoid heap allocation in favor of 174stack storage, if the scope of the variable is local. 175 176Automatic reallocation of allocatables 177====================================== 178Fortran 2003 introduced the ability to assign non-conforming array expressions 179to ALLOCATABLE arrays with the implied semantics of reallocation to the 180new shape. 181The implementation of this feature also becomes more straightforward if 182our implementation of array expressions has decoupled calculation of shapes 183from the evaluation of the elements of the result. 184 185Rewriting rules 186=============== 187Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into 188the result of an array expression or subexpression. 189 190* Array constructors always yield vectors; higher-rank arrays that appear as 191 constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. 192* Array constructors with multiple constituents are concatenations of 193 their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. 194* Array constructors with implied DO loops are difficult when nested 195 triangularly. 196* Whole array references can have lower bounds other than 1, so 197 `A => A(LBOUND(A,1):UBOUND(A,1),...)`. 198* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. 199* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. 200* Scalar operands ignore indices: `X{j,k} => X`. 201 Further, they are evaluated at most once. 202* Elemental operators and functions apply indices to their arguments: 203 `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. 204* `TRANSPOSE(X){j,k} => X{k,j}`. 205* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. 206 If X is sufficiently expensive to compute elementally, it might be evaluated 207 into a temporary. 208 209(more...) 210