Lexer.cpp - OpenGrok history log for /llvm-project-15.0.7/clang/lib/Lex/Lexer.cpp

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# aee76cb5	23-Jul-2022	Corentin Jabot <[email protected]>	[Clang] Add support for Unicode identifiers (UAX31) in C2x mode. This implements N2836 Identifier Syntax using Unicode Standard Annex 31. The feature was already implemented for C++, and the semant [Clang] Add support for Unicode identifiers (UAX31) in C2x mode. This implements N2836 Identifier Syntax using Unicode Standard Annex 31. The feature was already implemented for C++, and the semantics are the same. Unlike C++ there was, afaict, no decision to backport the feature in older languages mode, so C17 and earlier are not modified and the code point tables for these language modes are conserved. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D130416 show more ...
# 6882ca9a	13-Jul-2022	Corentin Jabot <[email protected]>	[Clang] Adjust extension warnings for delimited sequences WG21 approved delimited escape sequences and named escape sequences. Adjust the extension warnings accordingly, and update the release notes [Clang] Adjust extension warnings for delimited sequences WG21 approved delimited escape sequences and named escape sequences. Adjust the extension warnings accordingly, and update the release notes. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D129664 show more ...
Revision tags: llvmorg-14.0.6
# d4892a16	17-Jun-2022	Corentin Jabot <[email protected]>	[Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places i [Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places is already diagnosed, as that cannot appear in identifiers and other grammar constructs. The warning is off by default as its likely to be somewhat disruptive otherwise. This warning allows clang to conform to the yet-to be approved WG21 "P2295R5 Support for UTF-8 as a portable source file encoding" paper. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D128059 show more ...
# a262f4db	12-Jul-2022	Jonas Devlieghere <[email protected]>	Revert "[Clang] Add a warning on invalid UTF-8 in comments." This reverts commit cc309721d20c8e544ae7a10a66735ccf4981a11c because it breaks the following tests on GreenDragon: TestDataFormatterOb Revert "[Clang] Add a warning on invalid UTF-8 in comments." This reverts commit cc309721d20c8e544ae7a10a66735ccf4981a11c because it breaks the following tests on GreenDragon: TestDataFormatterObjCCF.py TestDataFormatterObjCExpr.py TestDataFormatterObjCKVO.py TestDataFormatterObjCNSBundle.py TestDataFormatterObjCNSData.py TestDataFormatterObjCNSError.py TestDataFormatterObjCNSNumber.py TestDataFormatterObjCNSURL.py TestDataFormatterObjCPlain.py TestDataFormatterObjNSException.py https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/ show more ...
# cc309721	17-Jun-2022	Corentin Jabot <[email protected]>	[Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places i [Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places is already diagnosed, as that cannot appear in identifiers and other grammar constructs. The warning is off by default as its likely to be somewhat disruptive otherwise. This warning allows clang to conform to the yet-to be approved WG21 "P2295R5 Support for UTF-8 as a portable source file encoding" paper. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D128059 show more ...
# 50416e54	09-Jul-2022	Corentin Jabot <[email protected]>	Revert "[Clang] Add a warning on invalid UTF-8 in comments." It is probable thart this change crashes on the powerpc bots. This reverts commit 355532a1499aa9b13a89fb5b5caaba2344d57cd7.
# 355532a1	17-Jun-2022	Corentin Jabot <[email protected]>	[Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places i [Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places is already diagnosed, as that cannot appear in identifiers and other grammar constructs. The warning is off by default as its likely to be somewhat disruptive otherwise. This warning allows clang to conform to the yet-to be approved WG21 "P2295R5 Support for UTF-8 as a portable source file encoding" paper. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D128059 show more ...
# e9fe20da	06-Jul-2022	Nico Weber <[email protected]>	Revert "[Clang] Add a warning on invalid UTF-8 in comments." This reverts commit 4174f0ca618b467571b43cff12cbe4c4239670f8. Also revert follow-up "[Clang] Fix invalid utf-8 detection" This reverts c Revert "[Clang] Add a warning on invalid UTF-8 in comments." This reverts commit 4174f0ca618b467571b43cff12cbe4c4239670f8. Also revert follow-up "[Clang] Fix invalid utf-8 detection" This reverts commit bf45e27a676d87944f1f13d5f0d0f39935fc4010. The second commit broke tests, see comments on https://reviews.llvm.org/D129223, and it sounds like the first commit isn't valid without the second one. So reverting both for now. show more ...
# 4174f0ca	17-Jun-2022	Corentin Jabot <[email protected]>	[Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places i [Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places is already diagnosed, as that cannot appear in identifiers and other grammar constructs. The warning is off by default as its likely to be somewhat disruptive otherwise. This warning allows clang to conform to the yet-to be approved WG21 "P2295R5 Support for UTF-8 as a portable source file encoding" paper. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D128059 show more ...
# fb06dd3e	06-Jul-2022	Corentin Jabot <[email protected]>	Revert "[Clang] Add a warning on invalid UTF-8 in comments." Reverting while I investigate build failures This reverts commit e3dc56805f1029dd5959e4c69196a287961afb8d.
# e3dc5680	17-Jun-2022	Corentin Jabot <[email protected]>	[Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places i [Clang] Add a warning on invalid UTF-8 in comments. Introduce an off-by default `-Winvalid-utf8` warning that detects invalid UTF-8 code units sequences in comments. Invalid UTF-8 in other places is already diagnosed, as that cannot appear in identifiers and other grammar constructs. The warning is off by default as its likely to be somewhat disruptive otherwise. This warning allows clang to conform to the yet-to be approved WG21 "P2295R5 Support for UTF-8 as a portable source file encoding" paper. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D128059 show more ...
# c68b8c84	28-Jun-2022	Argyrios Kyrtzidis <[email protected]>	[Lex] Make sure to notify `MultipleIncludeOpt` for "read tokens" during fast dependency directive lexing Otherwise a header may be erroneously marked as having a header macro guard and won't get re- [Lex] Make sure to notify `MultipleIncludeOpt` for "read tokens" during fast dependency directive lexing Otherwise a header may be erroneously marked as having a header macro guard and won't get re-included. Differential Revision: https://reviews.llvm.org/D128772 show more ...
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# c92056d0	04-Apr-2022	Corentin Jabot <[email protected]>	[Clang][C++23] P2071 Named universal character escapes Implements [[ https://wg21.link/p2071r1 \| P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not wa [Clang][C++23] P2071 Named universal character escapes Implements [[ https://wg21.link/p2071r1 \| P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July). We add * A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)` * A set of functions in `Unicode.h` to query that data, including * A function to find an exact match of a given Unicode character name * A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching * A function returning the best matching codepoint for a given string per edit distance * Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits * Support of `\N{}` as UCN with loose matching diagnostics/fixits. Loose matching is considered an error to match closely the semantics of P2071. The generated data contributes to 280kB of data to the binaries. `UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D123064 show more ...
# fad6e379	28-May-2022	Argyrios Kyrtzidis <[email protected]>	[Lex] Fix crash during dependency scanning while skipping an unmatched `#if`
# b4c83a13	12-May-2022	Argyrios Kyrtzidis <[email protected]>	[Tooling/DependencyScanning & Preprocessor] Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources This is a commit with the following changes: [Tooling/DependencyScanning & Preprocessor] Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources This is a commit with the following changes: * Remove `ExcludedPreprocessorDirectiveSkipMapping` and related functionality Removes `ExcludedPreprocessorDirectiveSkipMapping`; its intended benefit for fast skipping of excluded directived blocks will be superseded by a follow-up patch in the series that will use dependency scanning lexing for the same purpose. * Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources Replaces the "source minimization" mechanism with a mechanism that produces lexed dependency directives tokens. * Make the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer` This is bringing the following benefits: * Full access to the preprocessor state during dependency scanning. E.g. a component can see what includes were taken and where they were located in the actual sources. * Improved performance for dependency scanning. Measurements with a release+thin-LTO build shows ~ -11% reduction in wall time. * Opportunity to use dependency scanning lexing to speed-up skipping of excluded conditional blocks during normal preprocessing (as follow-up, not part of this patch). For normal preprocessing measurements show differences are below the noise level. Since, after this change, we don't minimize sources and pass them in place of the real sources, `DependencyScanningFilesystem` is not technically necessary, but it has valuable performance benefits for caching file `stat`s along with the results of scanning the sources. So the setup of using the `DependencyScanningFilesystem` during a dependency scan remains. Differential Revision: https://reviews.llvm.org/D125486 Differential Revision: https://reviews.llvm.org/D125487 Differential Revision: https://reviews.llvm.org/D125488 show more ...
# e9a902c7	16-Apr-2022	Christopher Di Bella <[email protected]>	Revert "Revert "Revert "[clang][pp] adds '#pragma include_instead'""" > Includes regression test for problem noted by @hans. > is reverts commit 973de71. > > Differential Revision: https://reviews.l Revert "Revert "Revert "[clang][pp] adds '#pragma include_instead'""" > Includes regression test for problem noted by @hans. > is reverts commit 973de71. > > Differential Revision: https://reviews.llvm.org/D106898 Feature implemented as-is is fairly expensive and hasn't been used by libc++. A potential reimplementation is possible if libc++ become interested in this feature again. Differential Revision: https://reviews.llvm.org/D123885 show more ...
Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1
# 33ec6530	08-Feb-2022	Timm Bäder <[email protected]>	[clang][lexer] Allow u8 character literal prefixes in C2x Implement N2418 for C2x. Differential Revision: https://reviews.llvm.org/D119221
# d813116c	01-Mar-2022	Dawid Jurczak <[email protected]>	[NFC][Lexer] Remove getLangOpts function from Lexer Given that there is only one external user of Lexer::getLangOpts we can remove getter entirely without much pain. Differential Revision: https:// [NFC][Lexer] Remove getLangOpts function from Lexer Given that there is only one external user of Lexer::getLangOpts we can remove getter entirely without much pain. Differential Revision: https://reviews.llvm.org/D120404 show more ...
# a64d3c60	25-Feb-2022	Dawid Jurczak <[email protected]>	[NFC][Lexer] Make Lexer::LangOpts const reference This change can be seen as code cleanup but motivation is more performance related. While browsing perf reports captured during Linux build we can n [NFC][Lexer] Make Lexer::LangOpts const reference This change can be seen as code cleanup but motivation is more performance related. While browsing perf reports captured during Linux build we can notice unusual portion of instructions executed in std::vector<std::string> copy constructor like: 0.59% 0.58% clang-14 clang-14 [.] std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector or even: 1.42% 0.26% clang clang-14 [.] clang::LangOptions::LangOptions \| --1.16%--clang::LangOptions::LangOptions \| --0.74%--std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector After more digging we can see that relevant LangOptions std::vector members (Files, ModuleFeatures and NoBuiltinFuncs) are constructed when Lexer::LangOpts field is initialized on list: Lexer::Lexer(..., const LangOptions &langOpts, ...) : ..., LangOpts(langOpts), Since LangOptions copy constructor is called by Lexer(..., const LangOptions &LangOpts,...) and local Lexer objects are created thousands times (in Lexer::getRawToken, Preprocessor::EnterSourceFile and more) during single module processing in frontend it makes std::vector copy constructors surprisingly hot. Unfortunately even though in current Lexer implementation mentioned std::vector members are unused and most of time empty, no compiler is smart enough to optimize their std::vector copy constructors out (take a look at test assembly): https://godbolt.org/z/hdoxPfMYY even with LTO enabled. However there is simple way to fix this. Since Lexer doesn't access Files, ModuleFeatures, NoBuiltinFuncs and any other LangOptions fields (but only LangOptionsBase) we can simply get rid of redundant copy constructor assembly by changing LangOpts type to more appropriate const LangOptions reference: https://godbolt.org/z/fP7de9176 Additionally we need to store LineComment outside LangOpts because it's written in SkipLineComment function. Also FormatTokenLexer need to be adjusted a bit to avoid lifetime issues related to passing local LangOpts reference to Lexer. After this change I can see more than 1% speedup in some of my microbenchmarks when using Clang release binary built with LTO. For Linux build gains are not so significant but still nice at the level of -0.4%/-0.5% instructions drop. Differential Revision: https://reviews.llvm.org/D120334 show more ...
# fbe38a78	22-Feb-2022	Dawid Jurczak <[email protected]>	[NFC][Lexer] Make access to LangOpts more consistent Before this change without any good reason Lexer::LangOpts is sometimes accessed by getter and another time read directly in Lexer functions. Sin [NFC][Lexer] Make access to LangOpts more consistent Before this change without any good reason Lexer::LangOpts is sometimes accessed by getter and another time read directly in Lexer functions. Since getLangOpts is a bit more verbose prefer direct access to LangOpts member when possible. Differential Revision: https://reviews.llvm.org/D120333 show more ...
Revision tags: llvmorg-15-init
# ff77071a	28-Jan-2022	Kadir Cetinkaya <[email protected]>	[clang][Lexer] Make raw and normal lexer behave the same for line comments Normally there are heruistics in lexer to treat `//` specially in language modes that don't have line comments (to emit `/ [clang][Lexer] Make raw and normal lexer behave the same for line comments Normally there are heruistics in lexer to treat `//` specially in language modes that don't have line comments (to emit `/`). Unfortunately this only applied to the first occurence of a line comment inside the file, as the subsequent line comments were treated as if language had support for them. This unfortunately only holds in normal lexing mode, as in raw mode all occurences of line comments received this treatment, which created discrepancies when comparing expanded and spelled tokens. The proper fix would be to just make sure we treat all the line comments with a subsequent `*` the same way, but it would imply breaking some code that's accepted by clang today. So instead we introduce the same bug into raw lexing mode. Fixes https://github.com/clangd/clangd/issues/1003. Differential Revision: https://reviews.llvm.org/D118471 show more ...
Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 298367ee	29-Dec-2021	Kazu Hirata <[email protected]>	[clang] Use nullptr instead of 0 or NULL (NFC) Identified with modernize-use-nullptr.
Revision tags: llvmorg-13.0.1-rc1
# 197576c4	18-Nov-2021	Jan Svoboda <[email protected]>	[clang][lex] Refactor check for the first file include This patch refactors the code that checks whether a file has just been included for the first time. The `HeaderSearch::FirstTimeLexingFile` fu [clang][lex] Refactor check for the first file include This patch refactors the code that checks whether a file has just been included for the first time. The `HeaderSearch::FirstTimeLexingFile` function is removed and the information is threaded to the original call site from `HeaderSearch::ShouldEnterIncludeFile`. This will make it possible to avoid tracking the number of includes in a follow up patch. Depends on D114092. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D114093 show more ...
Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# 274adcb8	15-Sep-2021	Corentin Jabot <[email protected]>	Implement delimited escape sequences. \x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode in characters and string literals. This is a feature proposed for both C++ (P2290R1) and C (N Implement delimited escape sequences. \x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode in characters and string literals. This is a feature proposed for both C++ (P2290R1) and C (N2785). The papers have been seen by both committees but are not yet adopted into either standard. However, they do have support from both committees. show more ...
# 601102d2	14-Sep-2021	Corentin Jabot <[email protected]>	Cleanup identifier parsing; NFC Rename methods to clearly signal when they only deal with ASCII, simplify the parsing of identifier, and use start/continue instead of head/body for consistency with Cleanup identifier parsing; NFC Rename methods to clearly signal when they only deal with ASCII, simplify the parsing of identifier, and use start/continue instead of head/body for consistency with Unicode terminology. show more ...
12 3 4 5 6 7 8 9 10 >>...15