1932aae77SSourabh Singh Tomar<!--===- docs/ArrayComposition.md 2932aae77SSourabh Singh Tomar 3932aae77SSourabh Singh Tomar Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4932aae77SSourabh Singh Tomar See https://llvm.org/LICENSE.txt for license information. 5932aae77SSourabh Singh Tomar SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 6932aae77SSourabh Singh Tomar 7932aae77SSourabh Singh Tomar--> 8932aae77SSourabh Singh Tomar 9*271a7bb1SRichard Barton# Array Composition 10*271a7bb1SRichard Barton 11*271a7bb1SRichard Barton```eval_rst 12*271a7bb1SRichard Barton.. contents:: 13*271a7bb1SRichard Barton :local: 14*271a7bb1SRichard Barton``` 15*271a7bb1SRichard Barton 16eaff2004Ssameeran joshiThis note attempts to describe the motivation for and design of an 17eaff2004Ssameeran joshiimplementation of Fortran 90 (and later) array expression evaluation that 18eaff2004Ssameeran joshiminimizes the use of dynamically allocated temporary storage for 19eaff2004Ssameeran joshithe results of calls to transformational intrinsic functions, and 20eaff2004Ssameeran joshimaking them more amenable to acceleration. 21eaff2004Ssameeran joshi 22eaff2004Ssameeran joshiThe transformational intrinsic functions of Fortran of interest to 23eaff2004Ssameeran joshius here include: 24eaff2004Ssameeran joshi 25eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, 26eaff2004Ssameeran joshi `DOT_PRODUCT`, 27eaff2004Ssameeran joshi `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) 28eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=)`, &c.) 29eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) 30eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(DIM=`, &c.) 31eaff2004Ssameeran joshi* `TRANSPOSE(M)` matrix transposition 32eaff2004Ssameeran joshi* `RESHAPE` without `ORDER=` 33eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=` 34eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` 35eaff2004Ssameeran joshi* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` 36eaff2004Ssameeran joshi* `PACK` and `UNPACK` 37eaff2004Ssameeran joshi* `MATMUL` 38eaff2004Ssameeran joshi* `SPREAD` 39eaff2004Ssameeran joshi 40eaff2004Ssameeran joshiOther Fortran intrinsic functions are technically transformational (e.g., 41eaff2004Ssameeran joshi`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. 42eaff2004Ssameeran joshiThe generic `REDUCE` is also not considered here. 43eaff2004Ssameeran joshi 44*271a7bb1SRichard Barton## Arrays as functions 45*271a7bb1SRichard Barton 46eaff2004Ssameeran joshiA whole array can be viewed as a function that maps its indices to the values 47eaff2004Ssameeran joshiof its elements. 48eaff2004Ssameeran joshiSpecifically, it is a map from a tuple of integers to its element type. 49eaff2004Ssameeran joshiThe rank of the array is the number of elements in that tuple, 50eaff2004Ssameeran joshiand the shape of the array delimits the domain of the map. 51eaff2004Ssameeran joshi 52eaff2004Ssameeran joshi`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers 53eaff2004Ssameeran joshi`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. 54eaff2004Ssameeran joshi 55*271a7bb1SRichard Barton## Array expressions as functions 56*271a7bb1SRichard Barton 57eaff2004Ssameeran joshiThe same perspective can be taken of an array expression comprising 58eaff2004Ssameeran joshiintrinsic operators and elemental functions. 59eaff2004Ssameeran joshiFortran doesn't allow one to apply subscripts directly to an expression, 60eaff2004Ssameeran joshibut expressions have rank and shape, and one can view array expressions 61eaff2004Ssameeran joshias functions over index tuples by applying those indices to the arrays 62eaff2004Ssameeran joshiand subexpressions in the expression. 63eaff2004Ssameeran joshi 64eaff2004Ssameeran joshiConsider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). 65eaff2004Ssameeran joshiThe right-hand side of that assignment could be evaluated into a 66eaff2004Ssameeran joshitemporary array `T` and then subscripted as it is copied into `B`. 67eaff2004Ssameeran joshi``` 68eaff2004Ssameeran joshiREAL, ALLOCATABLE :: T(:,:) 69eaff2004Ssameeran joshiALLOCATE(T(N,M)) 70eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M) 71eaff2004Ssameeran joshi T(J,K)=A(J,K) + 1.0 72eaff2004Ssameeran joshiEND DO 73eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M) 74eaff2004Ssameeran joshi B(J,K)=T(J,K) 75eaff2004Ssameeran joshiEND DO 76eaff2004Ssameeran joshiDEALLOCATE(T) 77eaff2004Ssameeran joshi``` 78eaff2004Ssameeran joshiBut we can avoid the allocation, population, and deallocation of 79eaff2004Ssameeran joshithe temporary by treating the right-hand side expression as if it 80eaff2004Ssameeran joshiwere a statement function `F(J,K)=A(J,K)+1.0` and evaluating 81eaff2004Ssameeran joshi``` 82eaff2004Ssameeran joshiDO CONCURRENT(J=1:N,K=1:M) 83eaff2004Ssameeran joshi A(J,K)=F(J,K) 84eaff2004Ssameeran joshiEND DO 85eaff2004Ssameeran joshi``` 86eaff2004Ssameeran joshi 87eaff2004Ssameeran joshiIn general, when a Fortran array assignment to a non-allocatable array 88eaff2004Ssameeran joshidoes not include the left-hand 89eaff2004Ssameeran joshiside variable as an operand of the right-hand side expression, and any 90eaff2004Ssameeran joshifunction calls on the right-hand side are elemental or scalar-valued, 91eaff2004Ssameeran joshiwe can avoid the use of a temporary. 92eaff2004Ssameeran joshi 93*271a7bb1SRichard Barton## Transformational intrinsic functions as function composition 94*271a7bb1SRichard Barton 95eaff2004Ssameeran joshiMany of the transformational intrinsic functions listed above 96eaff2004Ssameeran joshican, when their array arguments are viewed as functions over their 97eaff2004Ssameeran joshiindex tuples, be seen as compositions of those functions with 98eaff2004Ssameeran joshifunctions of the "incoming" indices -- yielding a function for 99eaff2004Ssameeran joshian entire right-hand side of an array assignment statement. 100eaff2004Ssameeran joshi 101eaff2004Ssameeran joshiFor example, the application of `TRANSPOSE(A + 1.0)` to the index 102eaff2004Ssameeran joshituple `(J,K)` becomes `A(K,J) + 1.0`. 103eaff2004Ssameeran joshi 104eaff2004Ssameeran joshiPartial (axial) reductions can be similarly composed. 105eaff2004Ssameeran joshiThe application of `SUM(A,DIM=2)` to the index `J` is the 106eaff2004Ssameeran joshicomplete reduction `SUM(A(J,:))`. 107eaff2004Ssameeran joshi 108eaff2004Ssameeran joshiMore completely: 109eaff2004Ssameeran joshi* Reductions to scalars (`SUM(X)` without `DIM=`) become 110eaff2004Ssameeran joshi runtime calls; the result needs no dynamic allocation, 111eaff2004Ssameeran joshi being a scalar. 112eaff2004Ssameeran joshi* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` 113eaff2004Ssameeran joshi become scalar values like `SUM(X(J,K,:))` if `d=3`. 114eaff2004Ssameeran joshi* Location reductions to indices (`MAXLOC(X)` without `DIM=`) 115eaff2004Ssameeran joshi do not require dynamic allocation, since their results are 116eaff2004Ssameeran joshi either scalar or small vectors of length `RANK(X)`. 117eaff2004Ssameeran joshi* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) 118eaff2004Ssameeran joshi are handled like other axial reductions like `SUM(DIM=)`. 119eaff2004Ssameeran joshi* `TRANSPOSE(M)` exchanges the two components of the index tuple. 120eaff2004Ssameeran joshi* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape 121eaff2004Ssameeran joshi vector `S`, and then use it to linearize indices into offsets 122eaff2004Ssameeran joshi in the storage order of `A` (whose shape must also be captured). 123eaff2004Ssameeran joshi These conversions can involve division and/or modulus, which 124eaff2004Ssameeran joshi can be optimized into a fixed-point multiplication using the 125eaff2004Ssameeran joshi usual technique. 126eaff2004Ssameeran joshi* `RESHAPE` with `ORDER=` is similar, but must permute the 127eaff2004Ssameeran joshi components of the index tuple; it generalizes `TRANSPOSE`. 128eaff2004Ssameeran joshi* `CSHIFT` applies addition and modulus. 129eaff2004Ssameeran joshi* `EOSHIFT` applies addition and a conditional move (`MERGE`). 130eaff2004Ssameeran joshi* `PACK` and `UNPACK` are likely to require a runtime call. 131eaff2004Ssameeran joshi* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but 132eaff2004Ssameeran joshi might benefit from calling a highly optimized runtime 133eaff2004Ssameeran joshi routine. 134eaff2004Ssameeran joshi* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply 135eaff2004Ssameeran joshi applies `A` to a reduced index tuple. 136eaff2004Ssameeran joshi 137*271a7bb1SRichard Barton## Determination of rank and shape 138*271a7bb1SRichard Barton 139eaff2004Ssameeran joshiAn important part of evaluating array expressions without the use of 140eaff2004Ssameeran joshitemporary storage is determining the shape of the result prior to, 141eaff2004Ssameeran joshior without, evaluating the elements of the result. 142eaff2004Ssameeran joshi 143eaff2004Ssameeran joshiThe shapes of array objects, results of elemental intrinsic functions, 144eaff2004Ssameeran joshiand results of intrinsic operations are obvious. 145eaff2004Ssameeran joshiBut it is possible to determine the shapes of the results of many 146eaff2004Ssameeran joshitransformational intrinsic function calls as well. 147eaff2004Ssameeran joshi 148eaff2004Ssameeran joshi* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: 149eaff2004Ssameeran joshi `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. 150eaff2004Ssameeran joshi (The `DIM=` argument is commonly a compile-time constant.) 151eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. 152eaff2004Ssameeran joshi* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. 153eaff2004Ssameeran joshi* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. 154eaff2004Ssameeran joshi* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. 155eaff2004Ssameeran joshi* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. 156eaff2004Ssameeran joshi* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` 157eaff2004Ssameeran joshi* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. 158eaff2004Ssameeran joshi* `RANK(PACK(...))` is always 1. 159eaff2004Ssameeran joshi* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. 160eaff2004Ssameeran joshi* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. 161eaff2004Ssameeran joshi* `SHAPE(SHAPE(X))` is `[RANK(X)]`. 162eaff2004Ssameeran joshi* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at 163eaff2004Ssameeran joshi dimension `d`. 164eaff2004Ssameeran joshi 165eaff2004Ssameeran joshiThis is useful because expression evaluations that *do* require temporaries 166eaff2004Ssameeran joshito hold their results (due to the context in which the evaluation occurs) 167eaff2004Ssameeran joshican be implemented with a separation of the allocation 168eaff2004Ssameeran joshiof the temporary array and the population of the array. 169eaff2004Ssameeran joshiThe code that evaluates the expression, or that implements a transformational 170eaff2004Ssameeran joshiintrinsic in the runtime library, can be designed with an API that includes 171eaff2004Ssameeran joshia pointer to the destination array as an argument. 172eaff2004Ssameeran joshi 173eaff2004Ssameeran joshiStatements like `ALLOCATE(A,SOURCE=expression)` should thus be capable 174eaff2004Ssameeran joshiof evaluating their array expressions directly into the newly-allocated 175eaff2004Ssameeran joshistorage for the allocatable array. 176eaff2004Ssameeran joshiThe implementation would generate code to calculate the shape, use it 177eaff2004Ssameeran joshito allocate the memory and populate the descriptor, and then drive a 178eaff2004Ssameeran joshiloop nest around the expression to populate the array. 179eaff2004Ssameeran joshiIn cases where the analyzed shape is known at compile time, we should 180eaff2004Ssameeran joshibe able to have the opportunity to avoid heap allocation in favor of 181eaff2004Ssameeran joshistack storage, if the scope of the variable is local. 182eaff2004Ssameeran joshi 183*271a7bb1SRichard Barton## Automatic reallocation of allocatables 184*271a7bb1SRichard Barton 185eaff2004Ssameeran joshiFortran 2003 introduced the ability to assign non-conforming array expressions 186eaff2004Ssameeran joshito ALLOCATABLE arrays with the implied semantics of reallocation to the 187eaff2004Ssameeran joshinew shape. 188eaff2004Ssameeran joshiThe implementation of this feature also becomes more straightforward if 189eaff2004Ssameeran joshiour implementation of array expressions has decoupled calculation of shapes 190eaff2004Ssameeran joshifrom the evaluation of the elements of the result. 191eaff2004Ssameeran joshi 192*271a7bb1SRichard Barton## Rewriting rules 193*271a7bb1SRichard Barton 194eaff2004Ssameeran joshiLet `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into 195eaff2004Ssameeran joshithe result of an array expression or subexpression. 196eaff2004Ssameeran joshi 197eaff2004Ssameeran joshi* Array constructors always yield vectors; higher-rank arrays that appear as 198eaff2004Ssameeran joshi constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. 199eaff2004Ssameeran joshi* Array constructors with multiple constituents are concatenations of 200eaff2004Ssameeran joshi their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. 201eaff2004Ssameeran joshi* Array constructors with implied DO loops are difficult when nested 202eaff2004Ssameeran joshi triangularly. 203eaff2004Ssameeran joshi* Whole array references can have lower bounds other than 1, so 204eaff2004Ssameeran joshi `A => A(LBOUND(A,1):UBOUND(A,1),...)`. 205eaff2004Ssameeran joshi* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. 206eaff2004Ssameeran joshi* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. 207eaff2004Ssameeran joshi* Scalar operands ignore indices: `X{j,k} => X`. 208eaff2004Ssameeran joshi Further, they are evaluated at most once. 209eaff2004Ssameeran joshi* Elemental operators and functions apply indices to their arguments: 210eaff2004Ssameeran joshi `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. 211eaff2004Ssameeran joshi* `TRANSPOSE(X){j,k} => X{k,j}`. 212eaff2004Ssameeran joshi* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. 213eaff2004Ssameeran joshi If X is sufficiently expensive to compute elementally, it might be evaluated 214eaff2004Ssameeran joshi into a temporary. 215eaff2004Ssameeran joshi 216eaff2004Ssameeran joshi(more...) 217