|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
aba43035 |
| 23-Jul-2022 |
Dmitri Gribenko <[email protected]> |
Use llvm::sort instead of std::sort where possible
llvm::sort is beneficial even when we use the iterator-based overload, since it can optionally shuffle the elements (to detect non-determinism). Ho
Use llvm::sort instead of std::sort where possible
llvm::sort is beneficial even when we use the iterator-based overload, since it can optionally shuffle the elements (to detect non-determinism). However llvm::sort is not usable everywhere, for example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
show more ...
|
| #
31211674 |
| 29-Jun-2022 |
Sam McCall <[email protected]> |
[pseudo] Add error-recovery framework & brace-based recovery
The idea is:
- a parse failure is detected when all heads die when trying to shift the next token - we can recover by choosing a nonterm
[pseudo] Add error-recovery framework & brace-based recovery
The idea is:
- a parse failure is detected when all heads die when trying to shift the next token - we can recover by choosing a nonterminal we're partway through parsing, and determining where it ends through nonlocal means (e.g. matching brackets) - we can find candidates by walking up the stack from the (ex-)heads - the token range is defined using heuristics attached to grammar rules - the unparsed region is represented in the forest by an Opaque node
This patch has the core GLR functionality. It does not allow recovery heuristics to be attached as extensions to the grammar, but rather infers a brace-based heuristic.
Expected followups:
- make recovery heuristics grammar extensions (depends on D127448) - add recovery to our grammar for bracketed constructs and sequence nodes - change the structure of our augmented `_ := start` rules to eliminate some special-cases in glrParse. - (if I can work out how): avoid some spurious recovery cases described in comments
(Previously mistakenly committed as a0f4c10ae227a62c2a63611e64eba83f0ff0f577)
Differential Revision: https://reviews.llvm.org/D128486
show more ...
|
| #
9fbf1107 |
| 04-Jul-2022 |
Sam McCall <[email protected]> |
[pseudo] Eliminate LRTable::Action. NFC
The last remaining uses are in tests/test builders. Replace with a builder struct.
Differential Revision: https://reviews.llvm.org/D129093
|
| #
b37dafd5 |
| 24-Jun-2022 |
Sam McCall <[email protected]> |
[pseudo] Store shift and goto actions in a compact structure with faster lookup.
The actions table is very compact but the binary search to find the correct action is relatively expensive. A hashtab
[pseudo] Store shift and goto actions in a compact structure with faster lookup.
The actions table is very compact but the binary search to find the correct action is relatively expensive. A hashtable is faster but pretty large (64 bits per value, plus empty slots, and lookup is constant time but not trivial due to collisions).
The structure in this patch uses 1.25 bits per entry (whether present or absent) plus the size of the values, and lookup is trivial.
The Shift table is 119KB = 27KB values + 92KB keys. The Goto table is 86KB = 30KB values + 57KB keys. (Goto has a smaller keyspace as #nonterminals < #terminals, and more entries).
This patch improves glrParse speed by 28%: 4.69 => 5.99 MB/s Overall the table grows by 60%: 142 => 228KB.
By comparison, DenseMap<unsigned, StateID> is "only" 16% faster (5.43 MB/s), and results in a 285% larger table (547 KB) vs the baseline.
Differential Revision: https://reviews.llvm.org/D128485
show more ...
|
| #
743971fa |
| 28-Jun-2022 |
Sam McCall <[email protected]> |
Revert "[pseudo] Add error-recovery framework & brace-based recovery"
This reverts commit a0f4c10ae227a62c2a63611e64eba83f0ff0f577. This commit hadn't been reviewed yet, and was unintentionally incl
Revert "[pseudo] Add error-recovery framework & brace-based recovery"
This reverts commit a0f4c10ae227a62c2a63611e64eba83f0ff0f577. This commit hadn't been reviewed yet, and was unintentionally included on another branch.
show more ...
|
|
Revision tags: llvmorg-14.0.6, llvmorg-14.0.5 |
|
| #
a0f4c10a |
| 08-Jun-2022 |
Sam McCall <[email protected]> |
[pseudo] Add error-recovery framework & brace-based recovery
The idea is: - a parse failure is detected when all heads die when trying to shift the next token - we can recover by choosing a non
[pseudo] Add error-recovery framework & brace-based recovery
The idea is: - a parse failure is detected when all heads die when trying to shift the next token - we can recover by choosing a nonterminal we're partway through parsing, and determining where it ends through nonlocal means (e.g. matching brackets) - we can find candidates by walking up the stack from the (ex-)heads - the token range is defined using heuristics attached to grammar rules - the unparsed region is represented in the forest by an Opaque node
This patch has the core GLR functionality. It does not allow recovery heuristics to be attached as extensions to the grammar, but rather infers a brace-based heuristic.
Expected followups: - make recovery heuristics grammar extensions (depends on D127448) - add recover to our grammar for bracketed constructs and sequence nodes - change the structure of our augmented `_ := start` rules to eliminate some special-cases in glrParse. - (if I can work out how): avoid some spurious recovery cases described in comments - grammar changes to eliminate the hard distinction between init-list and designated-init-list shown in the recovery-init-list.cpp testcase
Differential Revision: https://reviews.llvm.org/D128486
show more ...
|
| #
85eaecbe |
| 23-Jun-2022 |
Sam McCall <[email protected]> |
[pseudo] Check follow-sets instead of tying reduce actions to lookahead tokens.
Previously, the action table stores a reduce action for each lookahead token it should allow. These tokens are the fol
[pseudo] Check follow-sets instead of tying reduce actions to lookahead tokens.
Previously, the action table stores a reduce action for each lookahead token it should allow. These tokens are the followSet(action.rule.target).
In practice, the follow sets are large, so we spend a bunch of time binary searching around all these essentially-duplicates to check whether our lookahead token is there. However the number of reduces for a given state is very small, so we're much better off linear scanning over them and performing a fast check for each.
D128318 was an attempt at this, storing a bitmap for each reduce. However it's even more compact just to use the follow sets directly, as there are fewer nonterminals than (state, rule) pairs. It's also faster.
This specialized approach means unbundling Reduce from other actions in LRTable, so it's no longer useful to support it in Action. I suspect Action will soon go away, as we store each kind of action separately.
This improves glrParse speed by 42% (3.30 -> 4.69 MB/s). It also reduces LR table size by 59% (343 -> 142kB).
Differential Revision: https://reviews.llvm.org/D128472
show more ...
|
| #
b70ee9d9 |
| 23-Jun-2022 |
Sam McCall <[email protected]> |
Reland "[pseudo] Track heads as GSS nodes, rather than as "pending actions"."
This reverts commit 2c80b5319870b57fbdbb6c9cef9c86c26c65371d.
Fixes LRTable::buildForTest to create states that are ref
Reland "[pseudo] Track heads as GSS nodes, rather than as "pending actions"."
This reverts commit 2c80b5319870b57fbdbb6c9cef9c86c26c65371d.
Fixes LRTable::buildForTest to create states that are referenced but have no actions.
show more ...
|
| #
c70aeaad |
| 09-Jun-2022 |
Haojian Wu <[email protected]> |
[pseudo] Move grammar-related headers to a separate dir, NFC.
We did that for .cpp, but forgot the headers.
Differential Revision: https://reviews.llvm.org/D127388
|
| #
7a05942d |
| 08-Jun-2022 |
Haojian Wu <[email protected]> |
[pseudo] Remove the explicit Accept actions.
As pointed out in the previous review section, having a dedicated accept action doesn't seem to be necessary. This patch implements the the same behavior
[pseudo] Remove the explicit Accept actions.
As pointed out in the previous review section, having a dedicated accept action doesn't seem to be necessary. This patch implements the the same behavior without accept acction, which will save some code complexity.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D125677
show more ...
|
| #
93bcff8a |
| 03-Jun-2022 |
Sam McCall <[email protected]> |
[pseudo] Invert rows/columns of LRTable storage for speedup. NFC
There are more states than symbols. This means first partioning the action list by state leaves us with a smaller range to binary sea
[pseudo] Invert rows/columns of LRTable storage for speedup. NFC
There are more states than symbols. This means first partioning the action list by state leaves us with a smaller range to binary search over. This improves find() a lot and glrParse() by 7%. The tradeoff is storing more smaller ranges increases the size of the offsets array, overall grammar memory is +1% (337->340KB).
Before: glrParse 188795975 ns 188778003 ns 77 bytes_per_second=1.98068M/s After: glrParse 175936203 ns 175916873 ns 81 bytes_per_second=2.12548M/s
Differential Revision: https://reviews.llvm.org/D127006
show more ...
|
|
Revision tags: llvmorg-14.0.4 |
|
| #
cd2292ef |
| 24-May-2022 |
Haojian Wu <[email protected]> |
[pseudo] A basic implementation of compiling cxx grammar at build time.
The main idea is to compile the cxx grammar at build time, and construct the core pieces (Grammar, LRTable) of the pseudoparse
[pseudo] A basic implementation of compiling cxx grammar at build time.
The main idea is to compile the cxx grammar at build time, and construct the core pieces (Grammar, LRTable) of the pseudoparse based on the compiled data sources.
This is a tiny implementation, which is good for start:
- defines how the public API should look like; - integrates the cxx grammar compilation workflow with the cmake system. - onlynonterminal symbols of the C++ grammar are compiled, anything else are still doing the real compilation work at runtime, we can opt-in more bits in the future; - splits the monolithic clangPsuedo library for better layering;
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D125667
show more ...
|