History log of /llvm-project-15.0.7/polly/lib/CodeGen/PPCGCodeGeneration.cpp (Results 51 – 75 of 213)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# c52b71db 04-Oct-2017 Tobias Grosser <[email protected]>

[GPGPU] Make sure escaping invariant load hoisted scalars are preserved

We make sure that the final reload of an invariant scalar memory access uses the
same stack slot into which the invariant memo

[GPGPU] Make sure escaping invariant load hoisted scalars are preserved

We make sure that the final reload of an invariant scalar memory access uses the
same stack slot into which the invariant memory access was stored originally.
Earlier, this was broken as we introduce a new stack slot aside of the preload
stack slot, which remained uninitialized and caused our escaping loads to
contain garbage. This happened due to us clearing the pre-populated values
in EscapeMap after kernel code generation. We address this issue by preserving
the original host values and restoring them after kernel code generation.
EscapeMap is not expected to be used during kernel code generation, hence we
clear it during kernel generation to make sure that any unintended uses are
noticed.

llvm-svn: 314894

show more ...


# 2fb847fb 01-Oct-2017 Tobias Grosser <[email protected]>

[GPGPU] Set Polly's RTC to false in case invariant load hoisting fails

This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and
makes sure that we fall back to the original co

[GPGPU] Set Polly's RTC to false in case invariant load hoisting fails

This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and
makes sure that we fall back to the original code. It seems when invariant load
hoisting was introduced to the GPGPU backend we missed to reset the RTC flag,
such that kernels where invariant load hoisting failed executed the 'optimized'
SCoP, which however is set to a simple 'unreachable'. Unsurprisingly, this
results in hard to debug issues that are a lot of fun to debug.

llvm-svn: 314624

show more ...


# e2950f46 07-Sep-2017 Siddharth Bhat <[email protected]>

[PPCGCodeGen] Document pre-composition with Zero in getExtent. [NFC]

It's weird at first glance that we do this, so I wrote up some
documentation on why we need to perform this process.

llvm-svn: 3

[PPCGCodeGen] Document pre-composition with Zero in getExtent. [NFC]

It's weird at first glance that we do this, so I wrote up some
documentation on why we need to perform this process.

llvm-svn: 312715

show more ...


Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5
# 56572c6a 31-Aug-2017 Siddharth Bhat <[email protected]>

[PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.

This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while othe

[PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.

This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.

So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.

This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.

Differential Revision: https://reviews.llvm.org/D37056

llvm-svn: 312239

show more ...


Revision tags: llvmorg-5.0.0-rc4
# a4f447c2 28-Aug-2017 Michael Kruse <[email protected]>

[PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.

Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetect

[PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.

Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.

This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)

JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.

Differential Revision: https://reviews.llvm.org/D37010

llvm-svn: 311888

show more ...


Revision tags: llvmorg-5.0.0-rc3
# 78027437 24-Aug-2017 Siddharth Bhat <[email protected]>

[Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of functions in a kernel.

This is a stylistic change to make the function a little more readable.
Also add a debug print to show wh

[Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of functions in a kernel.

This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.

Differential Revision: https://reviews.llvm.org/D37058

llvm-svn: 311648

show more ...


# 3044dc51 23-Aug-2017 Michael Kruse <[email protected]>

[PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.

MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has t

[PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.

MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.

Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.

llvm-svn: 311548

show more ...


# 7b9f5ca2 21-Aug-2017 Siddharth Bhat <[email protected]>

[PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen.

This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been op

[PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen.

This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.

Differential Revision: https://reviews.llvm.org/D36934

llvm-svn: 311328

show more ...


# b09bd74d 21-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Add llvm.powi to the libdevice supported functions

These intrinsics are used in COSMO.

llvm-svn: 311324


# 5170b662 21-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Add log / logf to the libdevice supported functions

These two functions are used in COSMO

llvm-svn: 311322


# e32498c9 19-Aug-2017 Tobias Grosser <[email protected]>

Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"

We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues hav

Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"

We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.

This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.

llvm-svn: 311268

show more ...


# ecb94a03 19-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Correctly initialize array order and fixed_element information

Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can liv

[GPGPU] Correctly initialize array order and fixed_element information

Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.

We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36929

llvm-svn: 311259

show more ...


# 50139f0f 19-Aug-2017 Philipp Schaad <[email protected]>

[PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime

Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runt

[PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime

Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.

Differential revision: D36925

llvm-svn: 311248

show more ...


# 43df2020 19-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Collect parameter dimension used in MemoryAccesses

When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which re

[GPGPU] Collect parameter dimension used in MemoryAccesses

When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.

llvm-svn: 311239

show more ...


# ec02acfb 18-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]

Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accum

[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]

Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.

This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36869

llvm-svn: 311161

show more ...


# 656e6295 18-Aug-2017 Siddharth Bhat <[email protected]>

[Polly] [PPCGCodeGeneration] Print current Scop and loop depth in PPCGCodeGen. [NFC]

Differential Revision: https://reviews.llvm.org/D36871

llvm-svn: 311158


# 861a387f 18-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Do not create copy statements when targetting managed memory

Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel

[GPGPU] Do not create copy statements when targetting managed memory

Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36868

llvm-svn: 311157

show more ...


# 62acb344 18-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Synchronize after each kernel, not each copy out

Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchron

[GPGPU] Synchronize after each kernel, not each copy out

Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.

Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36867

llvm-svn: 311155

show more ...


# fa03cb76 17-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Only collect the access that belong to an array [NFC]

This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile

[GPGPU] Only collect the access that belong to an array [NFC]

This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.

llvm-svn: 311127

show more ...


# d2e57981 17-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Move getExtend to C++ [NFC]

llvm-svn: 311123


Revision tags: llvmorg-5.0.0-rc2
# cff9696e 10-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Make the ast_build available to block generator

This is necessary for partial writes (as used by delicm) to work.

llvm-svn: 310553


# c4a4af47 09-Aug-2017 Siddharth Bhat <[email protected]>

[ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.

This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counte

[ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.

This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.

Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.

A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.

Differential Revision: https://reviews.llvm.org/D36513

llvm-svn: 310471

show more ...


# 34eeabbc 09-Aug-2017 Siddharth Bhat <[email protected]>

[PPCGCodeGeneration] Compute element size in bytes for arrays correctly.

Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had el

[PPCGCodeGeneration] Compute element size in bytes for arrays correctly.

Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.

To fix this, ask data layout what the size in bytes should be.

Differential Revision: https://reviews.llvm.org/D36459

llvm-svn: 310448

show more ...


# 71dfb3eb 08-Aug-2017 Siddharth Bhat <[email protected]>

[Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.

To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCG

[Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.

To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.

Differential Revision: https://reviews.llvm.org/D36457

llvm-svn: 310350

show more ...


# d70ea7fe 07-Aug-2017 Tobias Grosser <[email protected]>

[GPGPU] Remove redundant constructors

llvm-svn: 310284


123456789