|
Revision tags: llvmorg-20.1.0, llvmorg-20.1.0-rc3, llvmorg-20.1.0-rc2, llvmorg-20.1.0-rc1, llvmorg-21-init, llvmorg-19.1.7, llvmorg-19.1.6, llvmorg-19.1.5, llvmorg-19.1.4, llvmorg-19.1.3, llvmorg-19.1.2, llvmorg-19.1.1, llvmorg-19.1.0, llvmorg-19.1.0-rc4, llvmorg-19.1.0-rc3, llvmorg-19.1.0-rc2, llvmorg-19.1.0-rc1, llvmorg-20-init, llvmorg-18.1.8, llvmorg-18.1.7, llvmorg-18.1.6, llvmorg-18.1.5, llvmorg-18.1.4, llvmorg-18.1.3, llvmorg-18.1.2, llvmorg-18.1.1, llvmorg-18.1.0, llvmorg-18.1.0-rc4, llvmorg-18.1.0-rc3, llvmorg-18.1.0-rc2, llvmorg-18.1.0-rc1, llvmorg-19-init, llvmorg-17.0.6, llvmorg-17.0.5, llvmorg-17.0.4, llvmorg-17.0.3, llvmorg-17.0.2, llvmorg-17.0.1, llvmorg-17.0.0, llvmorg-17.0.0-rc4, llvmorg-17.0.0-rc3, llvmorg-17.0.0-rc2, llvmorg-17.0.0-rc1, llvmorg-18-init, llvmorg-16.0.6, llvmorg-16.0.5, llvmorg-16.0.4, llvmorg-16.0.3, llvmorg-16.0.2, llvmorg-16.0.1, llvmorg-16.0.0, llvmorg-16.0.0-rc4, llvmorg-16.0.0-rc3, llvmorg-16.0.0-rc2, llvmorg-16.0.0-rc1, llvmorg-17-init, llvmorg-15.0.7, llvmorg-15.0.6, llvmorg-15.0.5, llvmorg-15.0.4, llvmorg-15.0.3, llvmorg-15.0.2, llvmorg-15.0.1, llvmorg-15.0.0, llvmorg-15.0.0-rc3, llvmorg-15.0.0-rc2, llvmorg-15.0.0-rc1, llvmorg-16-init |
|
| #
4f676c25 |
| 18-Jul-2022 |
Walter Erquinigo <[email protected]> |
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items - Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other
[trace][intel pt] Introduce wall clock time for each trace item
- Decouple TSCs from trace items - Turn TSCs into events just like CPUs. The new name is HW clock tick, wich could be reused by other vendors. - Add a GetWallTime that returns the wall time that the trace plug-in can infer for each trace item. - For intel pt, we are doing the following interpolation: if an instruction takes less than 1 TSC, we use that duration, otherwise, we assume the instruction took 1 TSC. This helps us avoid having to handle context switches, changes to kernel, idle times, decoding errors, etc. We are just trying to show some approximation and not the real data. For the real data, TSCs are the way to go. Besides that, we are making sure that no two trace items will give the same interpolation value. Finally, we are using as time 0 the time at which tracing started.
Sample output:
``` (lldb) r Process 750047 launched: '/home/wallace/a.out' (x86_64) Process 750047 stopped * thread #1, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x0000000000402479 a.out`main at main.cpp:29:20 26 }; 27 28 int main() { -> 29 std::vector<int> vvv; 30 for (int i = 0; i < 100; i++) 31 vvv.push_back(i); 32 (lldb) process trace start -s 64kb -t --per-cpu (lldb) b 60 Breakpoint 2: where = a.out`main + 1689 at main.cpp:60:23, address = 0x0000000000402afe (lldb) c Process 750047 resuming Process 750047 stopped * thread #1, name = 'a.out', stop reason = breakpoint 2.1 frame #0: 0x0000000000402afe a.out`main at main.cpp:60:23 57 map<int, int> m; 58 m[3] = 4; 59 -> 60 map<string, string> m2; 61 m2["5"] = "6"; 62 63 std::vector<std::string> vs = {"2", "3"}; (lldb) thread trace dump instructions -t -f -e thread #1: tid = 750047 0: [379567.000 ns] (event) HW clock tick [48599428476224707] 1: [379569.000 ns] (event) CPU core changed [new CPU=2] 2: [390487.000 ns] (event) HW clock tick [48599428476246495] 3: [1602508.000 ns] (event) HW clock tick [48599428478664855] 4: [1662745.000 ns] (event) HW clock tick [48599428478785046] libc.so.6`malloc 5: [1662746.995 ns] 0x00007ffff7176660 endbr64 6: [1662748.991 ns] 0x00007ffff7176664 movq 0x32387d(%rip), %rax ; + 408 7: [1662750.986 ns] 0x00007ffff717666b pushq %r12 8: [1662752.981 ns] 0x00007ffff717666d pushq %rbp 9: [1662754.977 ns] 0x00007ffff717666e pushq %rbx 10: [1662756.972 ns] 0x00007ffff717666f movq (%rax), %rax 11: [1662758.967 ns] 0x00007ffff7176672 testq %rax, %rax 12: [1662760.963 ns] 0x00007ffff7176675 jne 0x9c7e0 ; <+384> 13: [1662762.958 ns] 0x00007ffff717667b leaq 0x17(%rdi), %rax 14: [1662764.953 ns] 0x00007ffff717667f cmpq $0x1f, %rax 15: [1662766.949 ns] 0x00007ffff7176683 ja 0x9c730 ; <+208> 16: [1662768.944 ns] 0x00007ffff7176730 andq $-0x10, %rax 17: [1662770.939 ns] 0x00007ffff7176734 cmpq $-0x41, %rax 18: [1662772.935 ns] 0x00007ffff7176738 seta %dl 19: [1662774.930 ns] 0x00007ffff717673b jmp 0x9c690 ; <+48> 20: [1662776.925 ns] 0x00007ffff7176690 cmpq %rdi, %rax 21: [1662778.921 ns] 0x00007ffff7176693 jb 0x9c7b0 ; <+336> 22: [1662780.916 ns] 0x00007ffff7176699 testb %dl, %dl 23: [1662782.911 ns] 0x00007ffff717669b jne 0x9c7b0 ; <+336> 24: [1662784.906 ns] 0x00007ffff71766a1 movq 0x3236c0(%rip), %r12 ; + 24 (lldb) thread trace dump instructions -t -f -e -J -c 4 [ { "id": 0, "timestamp_ns": "379567.000000", "event": "HW clock tick", "hwClock": 48599428476224707 }, { "id": 1, "timestamp_ns": "379569.000000", "event": "CPU core changed", "cpuId": 2 }, { "id": 2, "timestamp_ns": "390487.000000", "event": "HW clock tick", "hwClock": 48599428476246495 }, { "id": 3, "timestamp_ns": "1602508.000000", "event": "HW clock tick", "hwClock": 48599428478664855 }, { "id": 4, "timestamp_ns": "1662745.000000", "event": "HW clock tick", "hwClock": 48599428478785046 }, { "id": 5, "timestamp_ns": "1662746.995324", "loadAddress": "0x7ffff7176660", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "endbr64" }, { "id": 6, "timestamp_ns": "1662748.990648", "loadAddress": "0x7ffff7176664", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "movq" }, { "id": 7, "timestamp_ns": "1662750.985972", "loadAddress": "0x7ffff717666b", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "pushq" }, { "id": 8, "timestamp_ns": "1662752.981296", "loadAddress": "0x7ffff717666d", "module": "libc.so.6", "symbol": "malloc", "mnemonic": "pushq" } ] ```
Differential Revision: https://reviews.llvm.org/D130054
show more ...
|
| #
4a843d92 |
| 08-Jul-2022 |
Walter Erquinigo <[email protected]> |
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to [email protected] for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu
[trace][intel pt] Create a CPU change event and expose it in the dumper
Thanks to [email protected] for coming up with this feature.
When tracing in per-cpu mode, we have information of in which cpu we are execution each instruction, which comes from the context switch trace. This diff makes this information available as a `cpu changed event`, which an additional accessor in the cursor `GetCPU()`. As cpu changes are very infrequent, any consumer should listen to cpu change events instead of querying the actual cpu of a trace item. Once a cpu change event is seen, the consumer can invoke GetCPU() to get that information. Also, it's possible to invoke GetCPU() on an arbitrary instruction item, which will return the last cpu seen. However, this call is O(logn) and should be used sparingly.
Manually tested with a sample program that starts on cpu 52, then goes to 18, and then goes back to 52.
Differential Revision: https://reviews.llvm.org/D129340
show more ...
|
| #
a7d6c3ef |
| 24-Jun-2022 |
Walter Erquinigo <[email protected]> |
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this requires the API to handle events with payloads (
[trace] Make events first class items in the trace cursor and rework errors
We want to include events with metadata, like context switches, and this requires the API to handle events with payloads (e.g. information about such context switches). Besides this, we want to support multiple similar events between two consecutive instructions, like multiple context switches. However, the current implementation is not good for this because we are defining events as bitmask enums associated with specific instructions. Thus, we need to decouple instructions from events and make events actual items in the trace, just like instructions and errors.
- Add accessors in the TraceCursor to know if an item is an event or not - Modify from the TraceDumper all the way to DecodedThread to support - Renamed the paused event to disabled. - Improved the tsc handling logic. I was using an API for getting the tsc from libipt, but that was an overkill that should be used when not processing events manually, but as we are already processing events, we can more easily get the tscs. event items. Fortunately this simplified many things - As part of this refactor, I also fixed and long stating issue, which is that some non decoding errors were being inserted in the decoded thread. I changed this so that TraceIntelPT::Decode returns an error if the decoder couldn't be set up proplerly. Then, errors within a trace are actual anomalies found in between instrutions.
All test pass
Differential Revision: https://reviews.llvm.org/D128576
show more ...
|
|
Revision tags: llvmorg-14.0.6 |
|
| #
6a5355e8 |
| 14-Jun-2022 |
Walter Erquinigo <[email protected]> |
[trace][intelpt] Support system-wide tracing [20] - Rename some fields in the schema
As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json s
[trace][intelpt] Support system-wide tracing [20] - Rename some fields in the schema
As discusses offline with @jj10305, we are updating some naming used throughout the code, specially in the json schema
- traceBuffer -> iptTrace - core -> cpu
Differential Revision: https://reviews.llvm.org/D127817
show more ...
|
| #
67c24051 |
| 14-Jun-2022 |
Walter Erquinigo <[email protected]> |
[trace][intelpt] Support system-wide tracing [19] - Some other minor improvements
This addresses the issues in diffs [13], [14] and [16]
- Add better documentation - Fix some castings by making the
[trace][intelpt] Support system-wide tracing [19] - Some other minor improvements
This addresses the issues in diffs [13], [14] and [16]
- Add better documentation - Fix some castings by making them safer - Simplify CorrelateContextSwitchesAndIntelPtTraces - Rename some functions
Differential Revision: https://reviews.llvm.org/D127804
show more ...
|
|
Revision tags: llvmorg-14.0.5, llvmorg-14.0.4 |
|
| #
a19fcc2b |
| 24-May-2022 |
Walter Erquinigo <[email protected]> |
[trace][intelpt] Support system-wide tracing [14] - Decode per cpu
This is the final functional patch to support intel pt decoding per cpu. It works by doing the following:
- First, all context swi
[trace][intelpt] Support system-wide tracing [14] - Decode per cpu
This is the final functional patch to support intel pt decoding per cpu. It works by doing the following:
- First, all context switches are split by tid and sorted in order. This produces a list of continuous executes per thread per core. - Then, all intel pt subtraces are split by PSB boundaries and assigned to individual thread continuous executions on the same core by doing simple TSC-based comparisons. - With this, we have, per thread, a sorted list of continuous executions each one with a list of intel pt subtraces. Up to this point, this is really fast because no instructions were actually decoded. - Then, each thread can be decoded by traversing their continuous executions and intel pt subtraces. An advantage of having these continuous executions is that we can identify if a continuous exexecution doesn't have intel pt data, and thus has a gap in it. We can later to more sofisticated comparisons to identify if within a continuous execution there are gaps.
I'm adding a test as well.
Differential Revision: https://reviews.llvm.org/D126394
show more ...
|
|
Revision tags: llvmorg-14.0.3, llvmorg-14.0.2 |
|
| #
059f39d2 |
| 18-Apr-2022 |
Walter Erquinigo <[email protected]> |
[trace][intel pt] Support events
A trace might contain events traced during the target's execution. For example, a thread might be paused for some period of time due to context switches or breakpoin
[trace][intel pt] Support events
A trace might contain events traced during the target's execution. For example, a thread might be paused for some period of time due to context switches or breakpoints, which actually force a context switch. Not only that, a trace might be paused because the CPU decides to trace only a specific part of the target, like the address filtering provided by intel pt, which will cause pause events. Besides this case, other kinds of events might exist.
This patch adds the method `TraceCursor::GetEvents()`` that returns the list of events that happened right before the instruction being pointed at by the cursor. Some refactors were done to make this change simpler.
Besides this new API, the instruction dumper now supports the -e flag which shows pause events, like in the following example, where pauses happened due to breakpoints.
``` thread #1: tid = 2717361 a.out`main + 20 at main.cpp:27:20 0: 0x00000000004023d9 leaq -0x1200(%rbp), %rax [paused] 1: 0x00000000004023e0 movq %rax, %rdi [paused] 2: 0x00000000004023e3 callq 0x403a62 ; std::vector<int, std::allocator<int> >::vector at stl_vector.h:391:7 a.out`std::vector<int, std::allocator<int> >::vector() at stl_vector.h:391:7 3: 0x0000000000403a62 pushq %rbp 4: 0x0000000000403a63 movq %rsp, %rbp ```
The `dump info` command has also been updated and now it shows the number of instructions that have associated events.
Differential Revision: https://reviews.llvm.org/D123982
show more ...
|
|
Revision tags: llvmorg-14.0.1 |
|
| #
44103c96 |
| 08-Apr-2022 |
Walter Erquinigo <[email protected]> |
[trace][intelpt] Remove code smell when printing the raw trace size
Something ugly I did was to report the trace buffer size to the DecodedThread, which is later used as part of the `dump info` comm
[trace][intelpt] Remove code smell when printing the raw trace size
Something ugly I did was to report the trace buffer size to the DecodedThread, which is later used as part of the `dump info` command. Instead of doing that, we can just directly ask the trace for the raw buffer and print its size.
I thought about not asking for the entire trace but instead just for its size, but in this case, as our traces as not extremely big, I prefer to ask for the entire trace, ensuring it could be fetched, and then print its size.
Differential Revision: https://reviews.llvm.org/D123358
show more ...
|
| #
e0cfe20a |
| 06-Apr-2022 |
Walter Erquinigo <[email protected]> |
[trace][intel pt] Create a common accessor for live and postmortem data
Some parts of the code have to distinguish between live and postmortem threads to figure out how to get some data, e.g. thread
[trace][intel pt] Create a common accessor for live and postmortem data
Some parts of the code have to distinguish between live and postmortem threads to figure out how to get some data, e.g. thread trace buffers. This makes the code less generic and more error prone. An example of that is that we have two different decoders: LiveThreadDecoder and PostMortemThreadDecoder. They exist because getting the trace bufer is different for each case.
The problem doesn't stop there. Soon we'll have even more kinds of data, like the context switch trace, whose fetching will be different for live and post- mortem processes.
As a way to fix this, I'm creating a common API for accessing thread data, which is able to figure out how to handle the postmortem and live cases on behalf of the caller. As a result of that, I was able to eliminate the two decoders and unify them into a simpler one. Not only that, our TraceSave functionality only worked for live threads, but now it can also work for postmortem processes, which might be useful now, but it might in the future.
This common API is OnThreadBinaryDataRead. More information in the inline documentation.
Differential Revision: https://reviews.llvm.org/D123281
show more ...
|
| #
6423b502 |
| 01-Apr-2022 |
Walter Erquinigo <[email protected]> |
[trace][intel pt] Create a class for the libipt decoder wrapper
As we soon will need to decode multiple raw traces for the same thread, having a class that encapsulates the decoding of a single raw
[trace][intel pt] Create a class for the libipt decoder wrapper
As we soon will need to decode multiple raw traces for the same thread, having a class that encapsulates the decoding of a single raw trace is a stepping stone that will make the coming features easier to implement.
So, I'm creating a LibiptDecoder class with that purpose. I refactored the code and it's now much more readable. Besides that, more comments were added. With this new structure, it's also easier to implement unit tests.
Differential Revision: https://reviews.llvm.org/D123106
show more ...
|