History log of /llvm-project-15.0.7/clang/lib/Lex/Lexer.cpp (Results 1 – 25 of 372)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init
# aee76cb5 23-Jul-2022 Corentin Jabot <[email protected]>

[Clang] Add support for Unicode identifiers (UAX31) in C2x mode.

This implements
N2836 Identifier Syntax using Unicode Standard Annex 31.

The feature was already implemented for C++,
and the semant

[Clang] Add support for Unicode identifiers (UAX31) in C2x mode.

This implements
N2836 Identifier Syntax using Unicode Standard Annex 31.

The feature was already implemented for C++,
and the semantics are the same.

Unlike C++ there was, afaict, no decision to
backport the feature in older languages mode,
so C17 and earlier are not modified and the
code point tables for these language modes are conserved.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D130416

show more ...


# 6882ca9a 13-Jul-2022 Corentin Jabot <[email protected]>

[Clang] Adjust extension warnings for delimited sequences

WG21 approved delimited escape sequences and named escape
sequences.
Adjust the extension warnings accordingly, and update
the release notes

[Clang] Adjust extension warnings for delimited sequences

WG21 approved delimited escape sequences and named escape
sequences.
Adjust the extension warnings accordingly, and update
the release notes.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D129664

show more ...


Revision tags: llvmorg-14.0.6
# d4892a16 17-Jun-2022 Corentin Jabot <[email protected]>

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places i

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059

show more ...


# a262f4db 12-Jul-2022 Jonas Devlieghere <[email protected]>

Revert "[Clang] Add a warning on invalid UTF-8 in comments."

This reverts commit cc309721d20c8e544ae7a10a66735ccf4981a11c because it
breaks the following tests on GreenDragon:

TestDataFormatterOb

Revert "[Clang] Add a warning on invalid UTF-8 in comments."

This reverts commit cc309721d20c8e544ae7a10a66735ccf4981a11c because it
breaks the following tests on GreenDragon:

TestDataFormatterObjCCF.py
TestDataFormatterObjCExpr.py
TestDataFormatterObjCKVO.py
TestDataFormatterObjCNSBundle.py
TestDataFormatterObjCNSData.py
TestDataFormatterObjCNSError.py
TestDataFormatterObjCNSNumber.py
TestDataFormatterObjCNSURL.py
TestDataFormatterObjCPlain.py
TestDataFormatterObjNSException.py

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45288/

show more ...


# cc309721 17-Jun-2022 Corentin Jabot <[email protected]>

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places i

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059

show more ...


# 50416e54 09-Jul-2022 Corentin Jabot <[email protected]>

Revert "[Clang] Add a warning on invalid UTF-8 in comments."

It is probable thart this change crashes on the powerpc bots.

This reverts commit 355532a1499aa9b13a89fb5b5caaba2344d57cd7.


# 355532a1 17-Jun-2022 Corentin Jabot <[email protected]>

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places i

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059

show more ...


# e9fe20da 06-Jul-2022 Nico Weber <[email protected]>

Revert "[Clang] Add a warning on invalid UTF-8 in comments."

This reverts commit 4174f0ca618b467571b43cff12cbe4c4239670f8.

Also revert follow-up "[Clang] Fix invalid utf-8 detection"
This reverts c

Revert "[Clang] Add a warning on invalid UTF-8 in comments."

This reverts commit 4174f0ca618b467571b43cff12cbe4c4239670f8.

Also revert follow-up "[Clang] Fix invalid utf-8 detection"
This reverts commit bf45e27a676d87944f1f13d5f0d0f39935fc4010.

The second commit broke tests, see comments on
https://reviews.llvm.org/D129223, and it sounds like the first
commit isn't valid without the second one. So reverting both for now.

show more ...


# 4174f0ca 17-Jun-2022 Corentin Jabot <[email protected]>

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places i

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059

show more ...


# fb06dd3e 06-Jul-2022 Corentin Jabot <[email protected]>

Revert "[Clang] Add a warning on invalid UTF-8 in comments."

Reverting while I investigate build failures

This reverts commit e3dc56805f1029dd5959e4c69196a287961afb8d.


# e3dc5680 17-Jun-2022 Corentin Jabot <[email protected]>

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places i

[Clang] Add a warning on invalid UTF-8 in comments.

Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.

Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.

The warning is off by default as its likely to be somewhat disruptive
otherwise.

This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D128059

show more ...


# c68b8c84 28-Jun-2022 Argyrios Kyrtzidis <[email protected]>

[Lex] Make sure to notify `MultipleIncludeOpt` for "read tokens" during fast dependency directive lexing

Otherwise a header may be erroneously marked as having a header macro guard and won't get re-

[Lex] Make sure to notify `MultipleIncludeOpt` for "read tokens" during fast dependency directive lexing

Otherwise a header may be erroneously marked as having a header macro guard and won't get re-included.

Differential Revision: https://reviews.llvm.org/D128772

show more ...


Revision tags: llvmorg-14.0.5, llvmorg-14.0.4, llvmorg-14.0.3, llvmorg-14.0.2, llvmorg-14.0.1
# c92056d0 04-Apr-2022 Corentin Jabot <[email protected]>

[Clang][C++23] P2071 Named universal character escapes

Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not wa

[Clang][C++23] P2071 Named universal character escapes

Implements [[ https://wg21.link/p2071r1 | P2071 Named Universal Character Escapes ]] - as an extension in all language mode, the patch not warn in c++23 mode will be done later once this paper is plenary approved (in July).

We add

* A code generator that transforms `UnicodeData.txt` and `NameAliases.txt` to a space efficient data structure that can be queried in `O(NameLength)`
* A set of functions in `Unicode.h` to query that data, including

* A function to find an exact match of a given Unicode character name
* A function to perform a loose (ignoring case, space, underscore, medial hyphen) matching
* A function returning the best matching codepoint for a given string per edit distance

* Support of `\N{}` escape sequences in String and character Literals, with loose and typos diagnostics/fixits
* Support of `\N{}` as UCN with loose matching diagnostics/fixits.

Loose matching is considered an error to match closely the semantics of P2071.

The generated data contributes to 280kB of data to the binaries.

`UnicodeData.txt` and `NameAliases.txt` are not committed to the repository in this patch, and regenerating the data is a manual process.

Reviewed By: tahonermann

Differential Revision: https://reviews.llvm.org/D123064

show more ...


# fad6e379 28-May-2022 Argyrios Kyrtzidis <[email protected]>

[Lex] Fix crash during dependency scanning while skipping an unmatched `#if`


# b4c83a13 12-May-2022 Argyrios Kyrtzidis <[email protected]>

[Tooling/DependencyScanning & Preprocessor] Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources

This is a commit with the following changes:

[Tooling/DependencyScanning & Preprocessor] Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources

This is a commit with the following changes:

* Remove `ExcludedPreprocessorDirectiveSkipMapping` and related functionality

Removes `ExcludedPreprocessorDirectiveSkipMapping`; its intended benefit for fast skipping of excluded directived blocks
will be superseded by a follow-up patch in the series that will use dependency scanning lexing for the same purpose.

* Refactor dependency scanning to produce pre-lexed preprocessor directive tokens, instead of minimized sources

Replaces the "source minimization" mechanism with a mechanism that produces lexed dependency directives tokens.

* Make the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer`

This is bringing the following benefits:

* Full access to the preprocessor state during dependency scanning. E.g. a component can see what includes were taken and where they were located in the actual sources.
* Improved performance for dependency scanning. Measurements with a release+thin-LTO build shows ~ -11% reduction in wall time.
* Opportunity to use dependency scanning lexing to speed-up skipping of excluded conditional blocks during normal preprocessing (as follow-up, not part of this patch).

For normal preprocessing measurements show differences are below the noise level.

Since, after this change, we don't minimize sources and pass them in place of the real sources, `DependencyScanningFilesystem` is not technically necessary, but it has valuable performance benefits for caching file `stat`s along with the results of scanning the sources. So the setup of using the `DependencyScanningFilesystem` during a dependency scan remains.

Differential Revision: https://reviews.llvm.org/D125486
Differential Revision: https://reviews.llvm.org/D125487
Differential Revision: https://reviews.llvm.org/D125488

show more ...


# e9a902c7 16-Apr-2022 Christopher Di Bella <[email protected]>

Revert "Revert "Revert "[clang][pp] adds '#pragma include_instead'"""

> Includes regression test for problem noted by @hans.
> is reverts commit 973de71.
>
> Differential Revision: https://reviews.l

Revert "Revert "Revert "[clang][pp] adds '#pragma include_instead'"""

> Includes regression test for problem noted by @hans.
> is reverts commit 973de71.
>
> Differential Revision: https://reviews.llvm.org/D106898

Feature implemented as-is is fairly expensive and hasn't been used by
libc++. A potential reimplementation is possible if libc++ become
interested in this feature again.

Differential Revision: https://reviews.llvm.org/D123885

show more ...


Revision tags: llvmorg-14.0.0, llvmorg-14.0.0-rc4, llvmorg-14.0.0-rc3, llvmorg-14.0.0-rc2, llvmorg-14.0.0-rc1
# 33ec6530 08-Feb-2022 Timm Bäder <[email protected]>

[clang][lexer] Allow u8 character literal prefixes in C2x

Implement N2418 for C2x.

Differential Revision: https://reviews.llvm.org/D119221


# d813116c 01-Mar-2022 Dawid Jurczak <[email protected]>

[NFC][Lexer] Remove getLangOpts function from Lexer

Given that there is only one external user of Lexer::getLangOpts
we can remove getter entirely without much pain.

Differential Revision: https://

[NFC][Lexer] Remove getLangOpts function from Lexer

Given that there is only one external user of Lexer::getLangOpts
we can remove getter entirely without much pain.

Differential Revision: https://reviews.llvm.org/D120404

show more ...


# a64d3c60 25-Feb-2022 Dawid Jurczak <[email protected]>

[NFC][Lexer] Make Lexer::LangOpts const reference

This change can be seen as code cleanup but motivation is more performance related.
While browsing perf reports captured during Linux build we can n

[NFC][Lexer] Make Lexer::LangOpts const reference

This change can be seen as code cleanup but motivation is more performance related.
While browsing perf reports captured during Linux build we can notice unusual portion of instructions executed in std::vector<std::string> copy constructor like:

0.59% 0.58% clang-14 clang-14 [.] std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector

or even:

1.42% 0.26% clang clang-14 [.] clang::LangOptions::LangOptions
|
--1.16%--clang::LangOptions::LangOptions
|
--0.74%--std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >,
std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::vector

After more digging we can see that relevant LangOptions std::vector members (*Files, ModuleFeatures and NoBuiltinFuncs)
are constructed when Lexer::LangOpts field is initialized on list:

Lexer::Lexer(..., const LangOptions &langOpts, ...)
: ..., LangOpts(langOpts),

Since LangOptions copy constructor is called by Lexer(..., const LangOptions &LangOpts,...) and local Lexer objects are created thousands times
(in Lexer::getRawToken, Preprocessor::EnterSourceFile and more) during single module processing in frontend it makes std::vector copy constructors surprisingly hot.

Unfortunately even though in current Lexer implementation mentioned std::vector members are unused and most of time empty,
no compiler is smart enough to optimize their std::vector copy constructors out (take a look at test assembly): https://godbolt.org/z/hdoxPfMYY even with LTO enabled.
However there is simple way to fix this. Since Lexer doesn't access *Files, ModuleFeatures, NoBuiltinFuncs and any other LangOptions fields (but only LangOptionsBase)
we can simply get rid of redundant copy constructor assembly by changing LangOpts type to more appropriate const LangOptions reference: https://godbolt.org/z/fP7de9176

Additionally we need to store LineComment outside LangOpts because it's written in SkipLineComment function.
Also FormatTokenLexer need to be adjusted a bit to avoid lifetime issues related to passing local LangOpts reference to Lexer.

After this change I can see more than 1% speedup in some of my microbenchmarks when using Clang release binary built with LTO.
For Linux build gains are not so significant but still nice at the level of -0.4%/-0.5% instructions drop.

Differential Revision: https://reviews.llvm.org/D120334

show more ...


# fbe38a78 22-Feb-2022 Dawid Jurczak <[email protected]>

[NFC][Lexer] Make access to LangOpts more consistent

Before this change without any good reason Lexer::LangOpts is sometimes accessed by getter and another time read directly in Lexer functions.
Sin

[NFC][Lexer] Make access to LangOpts more consistent

Before this change without any good reason Lexer::LangOpts is sometimes accessed by getter and another time read directly in Lexer functions.
Since getLangOpts is a bit more verbose prefer direct access to LangOpts member when possible.

Differential Revision: https://reviews.llvm.org/D120333

show more ...


Revision tags: llvmorg-15-init
# ff77071a 28-Jan-2022 Kadir Cetinkaya <[email protected]>

[clang][Lexer] Make raw and normal lexer behave the same for line comments

Normally there are heruistics in lexer to treat `//*` specially in
language modes that don't have line comments (to emit `/

[clang][Lexer] Make raw and normal lexer behave the same for line comments

Normally there are heruistics in lexer to treat `//*` specially in
language modes that don't have line comments (to emit `/`). Unfortunately this
only applied to the first occurence of a line comment inside the file, as the
subsequent line comments were treated as if language had support for them.

This unfortunately only holds in normal lexing mode, as in raw mode all
occurences of line comments received this treatment, which created discrepancies
when comparing expanded and spelled tokens.

The proper fix would be to just make sure we treat all the line comments with a
subsequent `*` the same way, but it would imply breaking some code that's
accepted by clang today. So instead we introduce the same bug into raw lexing
mode.

Fixes https://github.com/clangd/clangd/issues/1003.

Differential Revision: https://reviews.llvm.org/D118471

show more ...


Revision tags: llvmorg-13.0.1, llvmorg-13.0.1-rc3, llvmorg-13.0.1-rc2
# 298367ee 29-Dec-2021 Kazu Hirata <[email protected]>

[clang] Use nullptr instead of 0 or NULL (NFC)

Identified with modernize-use-nullptr.


Revision tags: llvmorg-13.0.1-rc1
# 197576c4 18-Nov-2021 Jan Svoboda <[email protected]>

[clang][lex] Refactor check for the first file include

This patch refactors the code that checks whether a file has just been included for the first time.

The `HeaderSearch::FirstTimeLexingFile` fu

[clang][lex] Refactor check for the first file include

This patch refactors the code that checks whether a file has just been included for the first time.

The `HeaderSearch::FirstTimeLexingFile` function is removed and the information is threaded to the original call site from `HeaderSearch::ShouldEnterIncludeFile`. This will make it possible to avoid tracking the number of includes in a follow up patch.

Depends on D114092.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D114093

show more ...


Revision tags: llvmorg-13.0.0, llvmorg-13.0.0-rc4
# 274adcb8 15-Sep-2021 Corentin Jabot <[email protected]>

Implement delimited escape sequences.

\x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode
in characters and string literals.

This is a feature proposed for both C++ (P2290R1) and C (N

Implement delimited escape sequences.

\x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode
in characters and string literals.

This is a feature proposed for both C++ (P2290R1) and C (N2785). The
papers have been seen by both committees but are not yet adopted into
either standard. However, they do have support from both committees.

show more ...


# 601102d2 14-Sep-2021 Corentin Jabot <[email protected]>

Cleanup identifier parsing; NFC

Rename methods to clearly signal when they only deal with ASCII,
simplify the parsing of identifier, and use start/continue instead of
head/body for consistency with

Cleanup identifier parsing; NFC

Rename methods to clearly signal when they only deal with ASCII,
simplify the parsing of identifier, and use start/continue instead of
head/body for consistency with Unicode terminology.

show more ...


12345678910>>...15