History log of /llvm-project-15.0.7/polly/lib/CodeGen/PPCGCodeGeneration.cpp (Results 126 – 150 of 213)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 5cc87e3a 12-May-2017 Philip Pfaffe <[email protected]>

[Polly][NewPM] Port ScopDetection to the new PassManager

Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-P

[Polly][NewPM] Port ScopDetection to the new PassManager

Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now.

Reviewers: Meinersbur, grosser

Reviewed By: grosser

Subscribers: pollydev, sanjoy, nemanjai, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31459

llvm-svn: 302902

show more ...


# a90be207 09-May-2017 Siddharth Bhat <[email protected]>

[Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen

Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For th

[Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen

Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).

Reviewers: grosser, Meinersbur, bollu

Reviewed By: bollu

Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32961

llvm-svn: 302515

show more ...


# 17f01968 07-May-2017 Siddharth Bhat <[email protected]>

[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen

Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag

[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen

Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).

Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).

Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay

Reviewed By: grosser, Meinersbur

Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32431

llvm-svn: 302379

show more ...


# c1267b9b 05-May-2017 Siddharth Bhat <[email protected]>

Revert "[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen"

This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933.

Patches should have been submitted in the order of:

1. D

Revert "[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen"

This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933.

Patches should have been submitted in the order of:

1. D32852
2. D32854
3. D32431

I mistakenly pushed D32431(3) first. Reverting to push in the correct
order.

llvm-svn: 302217

show more ...


# 51904ae3 05-May-2017 Siddharth Bhat <[email protected]>

[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen

Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag

[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen

Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).

Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).

Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay

Reviewed By: grosser, Meinersbur

Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32431

llvm-svn: 302215

show more ...


# abed4969 28-Apr-2017 Siddharth Bhat <[email protected]>

[Polly] [PPCGCodeGeneration] Add managed memory support to GPU code
generation.

This needs changes to GPURuntime to expose synchronization between host
and device.

1. Needs better function naming,

[Polly] [PPCGCodeGeneration] Add managed memory support to GPU code
generation.

This needs changes to GPURuntime to expose synchronization between host
and device.

1. Needs better function naming, I want a better name than
"getOrCreateManagedDeviceArray"

2. DeviceAllocations is used by both the managed memory and the
non-managed memory path. This exploits the fact that the two code paths
are never run together. I'm not sure if this is the best design decision

Reviewed by: PhilippSchaad

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301640

show more ...


Revision tags: llvmorg-4.0.1-rc1
# d277feda 25-Apr-2017 Siddharth Bhat <[email protected]>

[PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility

Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved no

[PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility

Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved now, specifically requests
global address space to be annotated. This is necessary, if the IR should be
run through NVPTX to generate OpenCL compatible PTX.

The changes do not affect the PTX Strings generated for the CUDA target
(nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).

Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.

Contributed-by: Philipp Schaad

Reviewers: Meinersbur, grosser, bollu

Reviewed By: grosser, bollu

Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301299

show more ...


# 7b5a4dfd 11-Apr-2017 Tobias Grosser <[email protected]>

Exploit BasicBlock::getModule to shorten code

Suggested-by: Roman Gareev <[email protected]>
llvm-svn: 299914


# 67726b32 11-Apr-2017 Tobias Grosser <[email protected]>

SAdjust to recent change in constructor definition of AllocaInst

llvm-svn: 299913


# 2d950f36 04-Apr-2017 Philip Pfaffe <[email protected]>

[Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers

Summary:
A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interf

[Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers

Summary:
A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly.

This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however.

Reviewers: grosser, Meinersbur

Reviewed By: grosser

Subscribers: nemanjai, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31653

llvm-svn: 299423

show more ...


# de244eb4 12-Mar-2017 Tobias Grosser <[email protected]>

Possible error in doc comment

If a SCoP is most probably sequential, then it's better to run it on a CPU.
Hence, there's no point in running it on a GPU.

Reviewers: grosser

Subscribers: nemanjai

Possible error in doc comment

If a SCoP is most probably sequential, then it's better to run it on a CPU.
Hence, there's no point in running it on a GPU.

Reviewers: grosser

Subscribers: nemanjai

Tags: #polly

Contributed-by: Singapuram Sanjay <[email protected]>

Differential Revision: https://reviews.llvm.org/D30864

llvm-svn: 297578

show more ...


Revision tags: llvmorg-4.0.0, llvmorg-4.0.0-rc4, llvmorg-4.0.0-rc3
# 24222c73 01-Mar-2017 Tobias Grosser <[email protected]>

Fix namespaces after clang-format update

llvm-svn: 296635


# 52ab4943 23-Feb-2017 Michael Kruse <[email protected]>

Remove all references to PostDominators. NFC.

Marking a pass as preserved is necessary if any Polly pass uses it, even
if it is not preserved within the generated code. Not marking it would
cause th

Remove all references to PostDominators. NFC.

Marking a pass as preserved is necessary if any Polly pass uses it, even
if it is not preserved within the generated code. Not marking it would
cause the the Polly pass chain to be interrupted. It is not used by any
Polly pass anymore, hence we can remove all references to it.

llvm-svn: 295983

show more ...


Revision tags: llvmorg-4.0.0-rc2
# ff40087a 01-Feb-2017 Tobias Grosser <[email protected]>

Update to recent formatting changes

llvm-svn: 293756


# 587f1f57 28-Jan-2017 Tobias Grosser <[email protected]>

[Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap

Instead of keeping two separate maps from Value to Allocas, one for
MemoryType::Value and the other for MemoryType::PHI, we introduce a single

[Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap

Instead of keeping two separate maps from Value to Allocas, one for
MemoryType::Value and the other for MemoryType::PHI, we introduce a single map
from ScopArrayInfo to the corresponding Alloca. This change is intended, both as
a general simplification and cleanup, but also to reduce our use of
MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure
we have only a single place where the array (and its base pointer) for which we
generate code for is specified, which means we can more easily introduce new
access functions that use a different ScopArrayInfo as base. We already today
experiment with modifiable access functions, so this change does not address
a specific bug, but it just reduces the scope one needs to reason about.

Another motivation for this patch is https://reviews.llvm.org/D28518, where
memory accesses with different base pointers could possibly be mapped to a
single ScopArrayInfo object. Such a mapping is currently not possible, as we
currently generate alloca instructions according to the base addresses of the
memory accesses, not according to the ScopArrayInfo object they belong to. By
making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo
object will automatically mean that the same stack slot is used for these
arrays. For D28518 this is not a problem, as only MemoryType::Array objects are
mapping, but resolving this inconsistency will hopefully avoid confusion.

llvm-svn: 293374

show more ...


Revision tags: llvmorg-4.0.0-rc1
# 4d5a9172 14-Jan-2017 Tobias Grosser <[email protected]>

Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo

To benefit of the type safety guarantees of C++11 typed enums, which would have
caught the type mismatch fixed in r291960

Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo

To benefit of the type safety guarantees of C++11 typed enums, which would have
caught the type mismatch fixed in r291960, we make MemoryKind a typed enum.
This change also allows us to drop the 'MK_' prefix and to instead use the more
descriptive full name of the enum as prefix. To reduce the amount of typing
needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a
global scope, which means the ScopArrayInfo:: prefix is not needed. This move
also makes historically sense. In the beginning of Polly we had different
MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later
canonicalized to one. During this canonicalization we just choose the enum in
ScopArrayInfo, but did not consider to move this shared enum to global scope.

Reviewed-by: Michael Kruse <[email protected]>
Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 292030

show more ...


# e29db217 12-Jan-2017 Tobias Grosser <[email protected]>

Update to recent clang-format changes

llvm-svn: 291810


Revision tags: llvmorg-3.9.1, llvmorg-3.9.1-rc3, llvmorg-3.9.1-rc2, llvmorg-3.9.1-rc1
# df8f35b7 29-Nov-2016 Tobias Grosser <[email protected]>

Update for clang-format change in r288119

llvm-svn: 288134


# acf80064 02-Nov-2016 Eli Friedman <[email protected]>

[Polly CodeGen] Break critical edge from RTC to original loop.

This makes polly generate a CFG which is closer to what we want
in LLVM IR, with a loop preheader for the original loop. This is
just a

[Polly CodeGen] Break critical edge from RTC to original loop.

This makes polly generate a CFG which is closer to what we want
in LLVM IR, with a loop preheader for the original loop. This is
just a cleanup, but it exposes some fragile assumptions.

I'm not completely happy with the changes related to expandCodeFor;
RTCBB->getTerminator() is basically a random insertion point which
happens to work due to the way we generate runtime checks. I'm not
sure what the right answer looks like, though.

Differential Revision: https://reviews.llvm.org/D26053

llvm-svn: 285864

show more ...


# bc653f20 18-Sep-2016 Tobias Grosser <[email protected]>

GPGPU: Do not run mostly sequential kernels in GPU

In case sequential kernels are found deeper in the loop tree than any parallel
kernel, the overall scop is probably mostly sequential. Hence, run i

GPGPU: Do not run mostly sequential kernels in GPU

In case sequential kernels are found deeper in the loop tree than any parallel
kernel, the overall scop is probably mostly sequential. Hence, run it on the
CPU.

llvm-svn: 281849

show more ...


# 82f2af35 18-Sep-2016 Tobias Grosser <[email protected]>

GPGPU: Dynamically ensure 'sufficient compute'

Offloading to a GPU is only beneficial if there is a sufficient amount of
compute that can be accelerated. Many kernels just have a very small number
o

GPGPU: Dynamically ensure 'sufficient compute'

Offloading to a GPU is only beneficial if there is a sufficient amount of
compute that can be accelerated. Many kernels just have a very small number
of dynamic compute, which means GPU acceleration is not beneficial. We
compute at run-time an approximation of how many dynamic instructions will be
executed and fall back to CPU code in case this number is not sufficiently
large. To keep the run-time checking code simple, we over-approximate the
number of instructions executed in each statement by computing the volume of
the rectangular hull of its iteration space.

llvm-svn: 281848

show more ...


# 51dfc275 17-Sep-2016 Tobias Grosser <[email protected]>

GPGPU: Store back non-read-only scalars

We may generate GPU kernels that store into scalars in case we run some
sequential code on the GPU because the remaining data is expected to already be
on the

GPGPU: Store back non-read-only scalars

We may generate GPU kernels that store into scalars in case we run some
sequential code on the GPU because the remaining data is expected to already be
on the GPU. For these kernels it is important to not keep the scalar values
in thread-local registers, but to store them back to the corresponding device
memory objects that backs them up.

We currently only store scalars back at the end of a kernel. This is only
correct if precisely one thread is executed. In case more than one thread may
be run, we currently invalidate the scop. To support such cases correctly,
we would need to always load and store back from a corresponding global
memory slot instead of a thread-local alloca slot.

llvm-svn: 281838

show more ...


# fe74a7a1 17-Sep-2016 Tobias Grosser <[email protected]>

GPGPU: Detect read-only scalar arrays ...

and pass these by value rather than by reference.

llvm-svn: 281837


# aaabbbf8 15-Sep-2016 Tobias Grosser <[email protected]>

GPGPU: Do not assume arrays start at 0

Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
tran

GPGPU: Do not assume arrays start at 0

Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
transfers do not touch additional memory locations that are not covered in
the alias check. To ensure this, we make sure that the data we copy for a
given array is only the data from the smallest element accessed to the largest
element accessed.

We also adjust the size of the array according to the offset at which the array
is actually accessed.

An interesting result of this is: In case array are accessed with negative
subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
cover the full array. This is important as such code indeed exists in the wild.

llvm-svn: 281611

show more ...


# 0a893f7d 13-Sep-2016 Tobias Grosser <[email protected]>

GPGPU: Use const_cast to avoid compiler warning [NFC]

llvm-svn: 281333


123456789