| #
c52b71db |
| 04-Oct-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Make sure escaping invariant load hoisted scalars are preserved
We make sure that the final reload of an invariant scalar memory access uses the same stack slot into which the invariant memo
[GPGPU] Make sure escaping invariant load hoisted scalars are preserved
We make sure that the final reload of an invariant scalar memory access uses the same stack slot into which the invariant memory access was stored originally. Earlier, this was broken as we introduce a new stack slot aside of the preload stack slot, which remained uninitialized and caused our escaping loads to contain garbage. This happened due to us clearing the pre-populated values in EscapeMap after kernel code generation. We address this issue by preserving the original host values and restoring them after kernel code generation. EscapeMap is not expected to be used during kernel code generation, hence we clear it during kernel generation to make sure that any unintended uses are noticed.
llvm-svn: 314894
show more ...
|
| #
2fb847fb |
| 01-Oct-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Set Polly's RTC to false in case invariant load hoisting fails
This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and makes sure that we fall back to the original co
[GPGPU] Set Polly's RTC to false in case invariant load hoisting fails
This matches the behavior we already have in lib/Codegen/CodeGeneration.cpp and makes sure that we fall back to the original code. It seems when invariant load hoisting was introduced to the GPGPU backend we missed to reset the RTC flag, such that kernels where invariant load hoisting failed executed the 'optimized' SCoP, which however is set to a simple 'unreachable'. Unsurprisingly, this results in hard to debug issues that are a lot of fun to debug.
llvm-svn: 314624
show more ...
|
| #
e2950f46 |
| 07-Sep-2017 |
Siddharth Bhat <[email protected]> |
[PPCGCodeGen] Document pre-composition with Zero in getExtent. [NFC]
It's weird at first glance that we do this, so I wrote up some documentation on why we need to perform this process.
llvm-svn: 3
[PPCGCodeGen] Document pre-composition with Zero in getExtent. [NFC]
It's weird at first glance that we do this, so I wrote up some documentation on why we need to perform this process.
llvm-svn: 312715
show more ...
|
|
Revision tags: llvmorg-5.0.0, llvmorg-5.0.0-rc5 |
|
| #
56572c6a |
| 31-Aug-2017 |
Siddharth Bhat <[email protected]> |
[PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.
This is useful when we face certain intrinsics such as `llvm.exp.*` which cannot be lowered by the NVPTX backend while othe
[PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.
This is useful when we face certain intrinsics such as `llvm.exp.*` which cannot be lowered by the NVPTX backend while other intrinsics can.
So, we would need to keep blacklists of intrinsics that cannot be handled by the NVPTX backend. It is much simpler to try and promote all intrinsics to libdevice versions.
This patch makes function/intrinsic very uniform, and will always try to use a libdevice version if it exists.
Differential Revision: https://reviews.llvm.org/D37056
llvm-svn: 312239
show more ...
|
|
Revision tags: llvmorg-5.0.0-rc4 |
|
| #
a4f447c2 |
| 28-Aug-2017 |
Michael Kruse <[email protected]> |
[PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.
Properly require and preserve the OptimizationRemarkEmitter for use in ScopPass. Previously one had to get the ORE from ScopDetect
[PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.
Properly require and preserve the OptimizationRemarkEmitter for use in ScopPass. Previously one had to get the ORE from ScopDetection because CodeGeneration did not mark it as preserved. It would need to be recomputed which results in the legacy PM to throw away all previous SCoP analysis.
This also changes the implementation of ScopPass::getAnalysisUsage to not unconditionally preserve all passes, but only those needed to be preserved by any SCoP pass (at least when using the legacy PM). This allows invalidating DependenceInfo (and IslAstInfo) in case the pass would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)
JSONImporter should also invalidate the DependenceInfo. In this patch it marks DependenceInfo as preserved anyway because some regression tests depend on it.
Differential Revision: https://reviews.llvm.org/D37010
llvm-svn: 311888
show more ...
|
|
Revision tags: llvmorg-5.0.0-rc3 |
|
| #
78027437 |
| 24-Aug-2017 |
Siddharth Bhat <[email protected]> |
[Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of functions in a kernel.
This is a stylistic change to make the function a little more readable. Also add a debug print to show wh
[Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of functions in a kernel.
This is a stylistic change to make the function a little more readable. Also add a debug print to show what instruction contains a use of a function we don't understand in the kernel.
Differential Revision: https://reviews.llvm.org/D37058
llvm-svn: 311648
show more ...
|
| #
3044dc51 |
| 23-Aug-2017 |
Michael Kruse <[email protected]> |
[PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.
MSVC warns about comparison between a signed and unsigned integer. The rules of C(++) define that an unsigned comparison has t
[PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.
MSVC warns about comparison between a signed and unsigned integer. The rules of C(++) define that an unsigned comparison has to be carried-out in this case. This is unlikely to be intended.
Fix by assigning the loop's upper bound to a signed integer first. This also avoids repeated evaluation of the invariant upper bound.
llvm-svn: 311548
show more ...
|
| #
7b9f5ca2 |
| 21-Aug-2017 |
Siddharth Bhat <[email protected]> |
[PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen.
This feature was not enabled for `PPCGCodeGeneration`. Now that this is enabled, we can benchmark Scops that have been op
[PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen.
This feature was not enabled for `PPCGCodeGeneration`. Now that this is enabled, we can benchmark Scops that have been optimised with `-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.
Differential Revision: https://reviews.llvm.org/D36934
llvm-svn: 311328
show more ...
|
| #
b09bd74d |
| 21-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Add llvm.powi to the libdevice supported functions
These intrinsics are used in COSMO.
llvm-svn: 311324
|
| #
5170b662 |
| 21-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Add log / logf to the libdevice supported functions
These two functions are used in COSMO
llvm-svn: 311322
|
| #
e32498c9 |
| 19-Aug-2017 |
Tobias Grosser <[email protected]> |
Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"
We still see some issues with parameter space mismatches. Revert this to get a clean baseline. We will recommit after these issues hav
Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"
We still see some issues with parameter space mismatches. Revert this to get a clean baseline. We will recommit after these issues have been resolved.
This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.
llvm-svn: 311268
show more ...
|
| #
ecb94a03 |
| 19-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Correctly initialize array order and fixed_element information
Summary: This information is necessary for PPCG to perform correct life range reordering. With these changes applied we can liv
[GPGPU] Correctly initialize array order and fixed_element information
Summary: This information is necessary for PPCG to perform correct life range reordering. With these changes applied we can live-range reorder some of the important kernels in COSMO.
We also update and rename one test case, which previously could not be optimized and now is optimized thanks to live-range reordering. To preserve test coverage we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises our automatic abort in case of scalar writes in the kernel.
Reviewers: Meinersbur, bollu, singam-sanjay
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36929
llvm-svn: 311259
show more ...
|
| #
50139f0f |
| 19-Aug-2017 |
Philipp Schaad <[email protected]> |
[PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runt
[PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.
Differential revision: D36925
llvm-svn: 311248
show more ...
|
| #
43df2020 |
| 19-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Collect parameter dimension used in MemoryAccesses
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory we add parameter dimensions lazily to the domains, which re
[GPGPU] Collect parameter dimension used in MemoryAccesses
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory we add parameter dimensions lazily to the domains, which results in PPCG not including parameter dimensions that are only used in memory accesses in the kernel space. To make sure these parameters are still passed to the kernel, we collect these parameter dimensions and align the kernel's parameter space before code-generating it.
llvm-svn: 311239
show more ...
|
| #
ec02acfb |
| 18-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]
Summary: Drop unused parameter dimensions to reduce the size of the sets we are working with. Especially the computed dependences tend to accum
[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]
Summary: Drop unused parameter dimensions to reduce the size of the sets we are working with. Especially the computed dependences tend to accumulate a lot of parameters that are present in the input memory accesses, but often not necessary to express the actual dependences. As isl represents maps and sets with dense matrices, reducing the dimensionality of isl sets commonly reduces code generation performance.
This reduces compile time from 17 to 11 seconds for our test case. While this is not impressive, this patch helped me to identify the previous two performance improvements and additionally also increases readability of the isl data structures we use.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36869
llvm-svn: 311161
show more ...
|
| #
656e6295 |
| 18-Aug-2017 |
Siddharth Bhat <[email protected]> |
[Polly] [PPCGCodeGeneration] Print current Scop and loop depth in PPCGCodeGen. [NFC]
Differential Revision: https://reviews.llvm.org/D36871
llvm-svn: 311158
|
| #
861a387f |
| 18-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Do not create copy statements when targetting managed memory
Summary: They are not used and consequently do not even need to be computed. This reduces the overall compile time for our kernel
[GPGPU] Do not create copy statements when targetting managed memory
Summary: They are not used and consequently do not even need to be computed. This reduces the overall compile time for our kernel from 1m33s to 17s.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, pollydev, llvm-commits, kbarton
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36868
llvm-svn: 311157
show more ...
|
| #
62acb344 |
| 18-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Synchronize after each kernel, not each copy out
Summary: This change reduces the overall number of synchronize calls for kernels with a lot of output data at the cost of additional synchron
[GPGPU] Synchronize after each kernel, not each copy out
Summary: This change reduces the overall number of synchronize calls for kernels with a lot of output data at the cost of additional synchronize calls for kernels launched in sequence without any device to host transfers in between. As the latter pattern is a lot less frequent, this seems a better tradeoff.
Even though the above motivation would be motivation enough, this is just a step towards enabling ppcg to not compute to and from device copy calls at all, which would be incorrect in case we still relied on these calls to place our synchronization statements.
Reviewers: Meinersbur, bollu, singam-sanjay
Reviewed By: bollu
Subscribers: nemanjai, kbarton, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D36867
llvm-svn: 311155
show more ...
|
| #
fa03cb76 |
| 17-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Only collect the access that belong to an array [NFC]
This avoid the construction of very large sets and in many cases also keeps the number of parameters low. As a result, we see a compile
[GPGPU] Only collect the access that belong to an array [NFC]
This avoid the construction of very large sets and in many cases also keeps the number of parameters low. As a result, we see a compile time reduction from 5 minutes to only slightly above 1 minute for one of our larger test cases.
llvm-svn: 311127
show more ...
|
| #
d2e57981 |
| 17-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Move getExtend to C++ [NFC]
llvm-svn: 311123
|
|
Revision tags: llvmorg-5.0.0-rc2 |
|
| #
cff9696e |
| 10-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Make the ast_build available to block generator
This is necessary for partial writes (as used by delicm) to work.
llvm-svn: 310553
|
| #
c4a4af47 |
| 09-Aug-2017 |
Siddharth Bhat <[email protected]> |
[ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.
This pass is useful to automatically convert a codebase that uses malloc/free to use their managed memory counte
[ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.
This pass is useful to automatically convert a codebase that uses malloc/free to use their managed memory counterparts.
Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.
A future patch will teach ManagedMemoryRewrite to rewrite global arrays as pointers to globally allocated managed memory.
Differential Revision: https://reviews.llvm.org/D36513
llvm-svn: 310471
show more ...
|
| #
34eeabbc |
| 09-Aug-2017 |
Siddharth Bhat <[email protected]> |
[PPCGCodeGeneration] Compute element size in bytes for arrays correctly.
Previously, we used to compute this with `elementSizeInBits / 8`. This would yield an element size of 0 when the array had el
[PPCGCodeGeneration] Compute element size in bytes for arrays correctly.
Previously, we used to compute this with `elementSizeInBits / 8`. This would yield an element size of 0 when the array had element size < 8 in bits.
To fix this, ask data layout what the size in bytes should be.
Differential Revision: https://reviews.llvm.org/D36459
llvm-svn: 310448
show more ...
|
| #
71dfb3eb |
| 08-Aug-2017 |
Siddharth Bhat <[email protected]> |
[Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.
To do this, we replicate what `CodeGeneration` does. We expose `markNodeUnreachable` from `CodeGeneration` to `PPCG
[Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.
To do this, we replicate what `CodeGeneration` does. We expose `markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.
Differential Revision: https://reviews.llvm.org/D36457
llvm-svn: 310350
show more ...
|
| #
d70ea7fe |
| 07-Aug-2017 |
Tobias Grosser <[email protected]> |
[GPGPU] Remove redundant constructors
llvm-svn: 310284
|