1This note attempts to describe the motivation for and design of an 2implementation of Fortran 90 (and later) array expression evaluation that 3minimizes the use of dynamically allocated temporary storage for 4the results of calls to transformational intrinsic functions, and 5making them more amenable to acceleration. 6 7The transformational intrinsic functions of Fortran of interest to 8us here include: 9 10* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, 11 `DOT_PRODUCT`, 12 `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) 13* Axial reductions (`SUM(X,DIM=)`, &c.) 14* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) 15* Axial location reductions (`MAXLOC(DIM=`, &c.) 16* `TRANSPOSE(M)` matrix transposition 17* `RESHAPE` without `ORDER=` 18* `RESHAPE` with `ORDER=` 19* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` 20* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` 21* `PACK` and `UNPACK` 22* `MATMUL` 23* `SPREAD` 24 25Other Fortran intrinsic functions are technically transformational (e.g., 26`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. 27The generic `REDUCE` is also not considered here. 28 29Arrays as functions 30=================== 31A whole array can be viewed as a function that maps its indices to the values 32of its elements. 33Specifically, it is a map from a tuple of integers to its element type. 34The rank of the array is the number of elements in that tuple, 35and the shape of the array delimits the domain of the map. 36 37`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers 38`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. 39 40Array expressions as functions 41============================== 42The same perspective can be taken of an array expression comprising 43intrinsic operators and elemental functions. 44Fortran doesn't allow one to apply subscripts directly to an expression, 45but expressions have rank and shape, and one can view array expressions 46as functions over index tuples by applying those indices to the arrays 47and subexpressions in the expression. 48 49Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). 50The right-hand side of that assignment could be evaluated into a 51temporary array `T` and then subscripted as it is copied into `B`. 52``` 53REAL, ALLOCATABLE :: T(:,:) 54ALLOCATE(T(N,M)) 55DO CONCURRENT(J=1:N,K=1:M) 56 T(J,K)=A(J,K) + 1.0 57END DO 58DO CONCURRENT(J=1:N,K=1:M) 59 B(J,K)=T(J,K) 60END DO 61DEALLOCATE(T) 62``` 63But we can avoid the allocation, population, and deallocation of 64the temporary by treating the right-hand side expression as if it 65were a statement function `F(J,K)=A(J,K)+1.0` and evaluating 66``` 67DO CONCURRENT(J=1:N,K=1:M) 68 A(J,K)=F(J,K) 69END DO 70``` 71 72In general, when a Fortran array assignment to a non-allocatable array 73does not include the left-hand 74side variable as an operand of the right-hand side expression, and any 75function calls on the right-hand side are elemental or scalar-valued, 76we can avoid the use of a temporary. 77 78Transformational intrinsic functions as function composition 79============================================================ 80Many of the transformational intrinsic functions listed above 81can, when their array arguments are viewed as functions over their 82index tuples, be seen as compositions of those functions with 83functions of the "incoming" indices -- yielding a function for 84an entire right-hand side of an array assignment statement. 85 86For example, the application of `TRANSPOSE(A + 1.0)` to the index 87tuple `(J,K)` becomes `A(K,J) + 1.0`. 88 89Partial (axial) reductions can be similarly composed. 90The application of `SUM(A,DIM=2)` to the index `J` is the 91complete reduction `SUM(A(J,:))`. 92 93More completely: 94* Reductions to scalars (`SUM(X)` without `DIM=`) become 95 runtime calls; the result needs no dynamic allocation, 96 being a scalar. 97* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` 98 become scalar values like `SUM(X(J,K,:))` if `d=3`. 99* Location reductions to indices (`MAXLOC(X)` without `DIM=`) 100 do not require dynamic allocation, since their results are 101 either scalar or small vectors of length `RANK(X)`. 102* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) 103 are handled like other axial reductions like `SUM(DIM=)`. 104* `TRANSPOSE(M)` exchanges the two components of the index tuple. 105* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape 106 vector `S`, and then use it to linearize indices into offsets 107 in the storage order of `A` (whose shape must also be captured). 108 These conversions can involve division and/or modulus, which 109 can be optimized into a fixed-point multiplication using the 110 usual technique. 111* `RESHAPE` with `ORDER=` is similar, but must permute the 112 components of the index tuple; it generalizes `TRANSPOSE`. 113* `CSHIFT` applies addition and modulus. 114* `EOSHIFT` applies addition and a conditional move (`MERGE`). 115* `PACK` and `UNPACK` are likely to require a runtime call. 116* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but 117 might benefit from calling a highly optimized runtime 118 routine. 119* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply 120 applies `A` to a reduced index tuple. 121 122Determination of rank and shape 123=============================== 124An important part of evaluating array expressions without the use of 125temporary storage is determining the shape of the result prior to, 126or without, evaluating the elements of the result. 127 128The shapes of array objects, results of elemental intrinsic functions, 129and results of intrinsic operations are obvious. 130But it is possible to determine the shapes of the results of many 131transformational intrinsic function calls as well. 132 133* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: 134 `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. 135 (The `DIM=` argument is commonly a compile-time constant.) 136* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. 137* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. 138* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. 139* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. 140* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. 141* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` 142* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. 143* `RANK(PACK(...))` is always 1. 144* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. 145* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. 146* `SHAPE(SHAPE(X))` is `[RANK(X)]`. 147* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at 148 dimension `d`. 149 150This is useful because expression evaluations that *do* require temporaries 151to hold their results (due to the context in which the evaluation occurs) 152can be implemented with a separation of the allocation 153of the temporary array and the population of the array. 154The code that evaluates the expression, or that implements a transformational 155intrinsic in the runtime library, can be designed with an API that includes 156a pointer to the destination array as an argument. 157 158Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable 159of evaluating their array expressions directly into the newly-allocated 160storage for the allocatable array. 161The implementation would generate code to calculate the shape, use it 162to allocate the memory and populate the descriptor, and then drive a 163loop nest around the expression to populate the array. 164In cases where the analyzed shape is known at compile time, we should 165be able to have the opportunity to avoid heap allocation in favor of 166stack storage, if the scope of the variable is local. 167 168Automatic reallocation of allocatables 169====================================== 170Fortran 2003 introduced the ability to assign non-conforming array expressions 171to ALLOCATABLE arrays with the implied semantics of reallocation to the 172new shape. 173The implementation of this feature also becomes more straightforward if 174our implementation of array expressions has decoupled calculation of shapes 175from the evaluation of the elements of the result. 176 177Rewriting rules 178=============== 179Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into 180the result of an array expression or subexpression. 181 182* Array constructors always yield vectors; higher-rank arrays that appear as 183 constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. 184* Array constructors with multiple constituents are concatenations of 185 their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. 186* Array constructors with implied DO loops are difficult when nested 187 triangularly. 188* Whole array references can have lower bounds other than 1, so 189 `A => A(LBOUND(A,1):UBOUND(A,1),...)`. 190* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. 191* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. 192* Scalar operands ignore indices: `X{j,k} => X`. 193 Further, they are evaluated at most once. 194* Elemental operators and functions apply indices to their arguments: 195 `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. 196* `TRANSPOSE(X){j,k} => X{k,j}`. 197* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. 198 If X is sufficiently expensive to compute elementally, it might be evaluated 199 into a temporary. 200 201(more...) 202