insert_vector_elt.v2i16.ll - OpenGrok history log for /llvm-project-15.0.7/llvm/test/CodeGen/AMDGPU/insert_vector

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# 5cae8816	06-Jul-2022	Jay Foad <[email protected]>	[AMDGPU] Add GFX11 test coverage Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10. [AMDGPU] Add GFX11 test coverage Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10. Differential Revision: https://reviews.llvm.org/D129295 show more ...
# bd675af2	30-Jun-2022	Piotr Sobczak <[email protected]>	[AMDGPU] Make v16i16/v16f16 legal There are upcoming intrinsics to use the new types. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D128865
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5, llvmorg-14.0.4
# e2926501	16-May-2022	Jay Foad <[email protected]>	[AMDGPU] Aggressively fold immediates in SIShrinkInstructions Fold immediates regardless of how many uses they have. This is expected to increase overall code size, but decrease register usage. Dif [AMDGPU] Aggressively fold immediates in SIShrinkInstructions Fold immediates regardless of how many uses they have. This is expected to increase overall code size, but decrease register usage. Differential Revision: https://reviews.llvm.org/D114644 show more ...
# 3eb2281b	16-May-2022	Jay Foad <[email protected]>	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by ad [AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643 show more ...
# d40b7f0d	17-May-2022	Simon Pilgrim <[email protected]>	[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a si [DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node. AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback. Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125607 show more ...
# 1ecc3d86	14-May-2022	Simon Pilgrim <[email protected]>	[DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits Pulled out of D77804 as its going to be easier to address the regressions individually. This patch allows [DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits Pulled out of D77804 as its going to be easier to address the regressions individually. This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553 Differential Revision: https://reviews.llvm.org/D124839 show more ...
# 7e3ef7dc	07-May-2022	Simon Pilgrim <[email protected]>	[AMDGPU] lowerEXTRACT_VECTOR_ELT - fold from a SCALAR_TO_VECTOR source As suggested by @foad on D124839 If we're extracting a vector element that originally came from a scalar_to_vector, then avoid [AMDGPU] lowerEXTRACT_VECTOR_ELT - fold from a SCALAR_TO_VECTOR source As suggested by @foad on D124839 If we're extracting a vector element that originally came from a scalar_to_vector, then avoid the bitcasting of a vector type and perform the shift masking on the (any-extended) scalar source directly, making use of the fact that the upper elements of a scalar_to_vector are all undef. Differential Revision: https://reviews.llvm.org/D125173 show more ...
# 589b9df4	02-May-2022	hsmahesha <[email protected]>	[AMDGPU] Fix scalar_to_vector for v8i16/v8f16 so that the stack access is avoided. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D124734
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1, llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1, llvmorg-15-init, llvmorg-13.0.1, llvmorg-13.0.1-rc3
# bb1fe369	19-Jan-2022	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Make v8i16/v8f16 legal Differential Revision: https://reviews.llvm.org/D117721
Revision tags: llvmorg-13.0.1-rc2, llvmorg-13.0.1-rc1
# da067ed5	10-Nov-2021	Austin Kerbow <[email protected]>	[AMDGPU] Set most sched model resource's BufferSize to one Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memor [AMDGPU] Set most sched model resource's BufferSize to one Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memory ops and their consumers on an in-order processor. After this change, the scheduler will treat the data edges from loads as blocking so that stalls are guaranteed when waiting for data to be retreaved from memory. Since we don't actually track waitcnt here, this should do a better job at modeling their behavior. Practically, this means that the scheduler will trigger the 'STALL' heuristic more often. This type of change needs to be evaluated experimentally. Preliminary results are positive. Fixes: SWDEV-282962 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114777 show more ...
# 8a52bd82	19-Nov-2021	Jay Foad <[email protected]>	[AMDGPU] Only select VOP3 forms of VOP2 instructions Change VOP_PAT_GEN to default to not generating an instruction selection pattern for the VOP2 (e32) form of an instruction, only for the VOP3 (e6 [AMDGPU] Only select VOP3 forms of VOP2 instructions Change VOP_PAT_GEN to default to not generating an instruction selection pattern for the VOP2 (e32) form of an instruction, only for the VOP3 (e64) form. This allows SIFoldOperands maximum freedom to fold copies into the operands of an instruction, before SIShrinkInstructions tries to shrink it back to the smaller encoding. This affects the following VOP2 instructions: v_min_i32 v_max_i32 v_min_u32 v_max_u32 v_and_b32 v_or_b32 v_xor_b32 v_lshr_b32 v_ashr_i32 v_lshl_b32 A further cleanup could simplify or remove VOP_PAT_GEN, since its optional second argument is never used. Differential Revision: https://reviews.llvm.org/D114252 show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4, llvmorg-13.0.0-rc3
# 3ce1b963	08-Sep-2021	Joe Nash <[email protected]>	[AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D1095 [AMDGPU] Switch PostRA sched to MachineSched Use GCNHazardRecognizer in postra sched. Updated tests for the new schedules. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D109536 Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde show more ...
Revision tags: llvmorg-13.0.0-rc2, llvmorg-13.0.0-rc1, llvmorg-14-init
# 381ded34	28-Jun-2021	Stanislav Mekhanoshin <[email protected]>	[AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot [AMDGPU] Add S_MOV_B64_IMM_PSEUDO for wide constants This is to allow 64 bit constant rematerialization. If a constant is split into two separate moves initializing sub0 and sub1 like now RA cannot rematerizalize a 64 bit register. This gives 10-20% uplift in a set of huge apps heavily using double precession math. Fixes: SWDEV-292645 Differential Revision: https://reviews.llvm.org/D104874 show more ...
Revision tags: llvmorg-12.0.1, llvmorg-12.0.1-rc4, llvmorg-12.0.1-rc3
# 98f48723	24-Jun-2021	Carl Ritson <[email protected]>	[AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6 [AMDGPU] Add 224-bit vector types and link 192-bit types to MVTs Add SReg_224, VReg_224, AReg_224, etc. Link 224-bit types with v7i32/v7f32. Link existing 192-bit types to newly added v3i64/v3f64/v6i32/v6f32. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D104622 show more ...
Revision tags: llvmorg-12.0.1-rc2, llvmorg-12.0.1-rc1, llvmorg-12.0.0, llvmorg-12.0.0-rc5, llvmorg-12.0.0-rc4, llvmorg-12.0.0-rc3, llvmorg-12.0.0-rc2, llvmorg-11.1.0, llvmorg-11.1.0-rc3
# 39ef0965	28-Jan-2021	Jay Foad <[email protected]>	[AMDGPU] Simplify some RUN lines. NFC.
Revision tags: llvmorg-12.0.0-rc1, llvmorg-13-init, llvmorg-11.1.0-rc2, llvmorg-11.1.0-rc1, llvmorg-11.0.1, llvmorg-11.0.1-rc2
# 2291bd13	30-Nov-2020	Austin Kerbow <[email protected]>	[AMDGPU] Update subtarget features for new target ID support Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic [AMDGPU] Update subtarget features for new target ID support Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features. The possible settings are: - Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off". The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting. Differential Revision: https://reviews.llvm.org/D85882 show more ...
# 2f499b9a	19-Dec-2020	Tony <[email protected]>	[AMDGPU] Add volatile support to SIMemoryLegalizer Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant cache [AMDGPU] Add volatile support to SIMemoryLegalizer Treat a non-atomic volatile load and store as a relaxed atomic at system scope for the address spaces accessed. This will ensure all relevant caches will be bypassed. A volatile atomic is not changed and still only bypasses caches upto the level specified by the SyncScope operand. Differential Revision: https://reviews.llvm.org/D94214 show more ...
# c1cd42d6	06-Jan-2021	Mircea Trofin <[email protected]>	[NFC] Removed unused prefixes in CodeGen/AMDGPU This covers the tests starting with h-k. Differential Revision: https://reviews.llvm.org/D94147
Revision tags: llvmorg-11.0.1-rc1
# d2e52eec	10-Nov-2020	Matt Arsenault <[email protected]>	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.
# 0d092303	12-Oct-2020	Michael Liao <[email protected]>	[amdgpu] Enable use of AA during codegen. - Add an internal option `-amdgpu-use-aa-in-codegen` to enable or disable this feature. By Default, it's enabled. Differential Revision: https://reviews. [amdgpu] Enable use of AA during codegen. - Add an internal option `-amdgpu-use-aa-in-codegen` to enable or disable this feature. By Default, it's enabled. Differential Revision: https://reviews.llvm.org/D89320 show more ...
# 1bc7bfff	16-Oct-2020	Tony <[email protected]>	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to che [AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618 show more ...
Revision tags: llvmorg-11.0.0, llvmorg-11.0.0-rc6, llvmorg-11.0.0-rc5, llvmorg-11.0.0-rc4
# bab1a17a	24-Sep-2020	Jay Foad <[email protected]>	[AMDGPU] Add bfi immediate pattern Differential Revision: https://reviews.llvm.org/D88246
# 2806f586	24-Sep-2020	Jay Foad <[email protected]>	[AMDGPU] Make bfi patterns divergence-aware This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr. [AMDGPU] Make bfi patterns divergence-aware This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr. Differential Revision: https://reviews.llvm.org/D88245 show more ...
Revision tags: llvmorg-11.0.0-rc3, llvmorg-11.0.0-rc2
# e1a2f471	13-Aug-2020	Matt Arsenault <[email protected]>	AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cas AMDGPU: Match global saddr addressing mode The previous implementation was incorrect, and based off incorrect instruction definitions. Unfortunately we can't match natural addressing in a lot of cases due to the shift/scale applied in getelementptrs. This relies on reducing the 64-bit shift to 32-bits. show more ...
Revision tags: llvmorg-11.0.0-rc1, llvmorg-12-init, llvmorg-10.0.1, llvmorg-10.0.1-rc4, llvmorg-10.0.1-rc3, llvmorg-10.0.1-rc2, llvmorg-10.0.1-rc1, llvmorg-10.0.0, llvmorg-10.0.0-rc6, llvmorg-10.0.0-rc5, llvmorg-10.0.0-rc4, llvmorg-10.0.0-rc3, llvmorg-10.0.0-rc2, llvmorg-10.0.0-rc1, llvmorg-11-init
# 62fd7f76	07-Jan-2020	Jay Foad <[email protected]>	[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is grea [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall. Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth is greater than the already scheduled latency, which causes a stall. The fix is to apply the heuristic if the depth of either candidate is greater than the already scheduled latency. All this also applies to the BotHeightReduce heuristic in the bottom zone. Differential Revision: https://reviews.llvm.org/D72392 show more ...
12 3