|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
ea8d7647 |
| 27-Mar-2025 |
Steven Rostedt <[email protected]> |
tracing: Verify event formats that have "%*p.."
The trace event verifier checks the formats of trace events to make sure that they do not point at memory that is not in the trace event itself or in
tracing: Verify event formats that have "%*p.."
The trace event verifier checks the formats of trace events to make sure that they do not point at memory that is not in the trace event itself or in data that will never be freed. If an event references data that was allocated when the event triggered and that same data is freed before the event is read, then the kernel can crash by reading freed memory.
The verifier runs at boot up (or module load) and scans the print formats of the events and checks their arguments to make sure that dereferenced pointers are safe. If the format uses "%*p.." the verifier will ignore it, and that could be dangerous. Cover this case as well.
Also add to the sample code a use case of "%*pbl".
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers") Link: https://lore.kernel.org/[email protected] Reported-by: Libo Chen <[email protected]> Reviewed-by: Libo Chen <[email protected]> Tested-by: Libo Chen <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
5f3719f6 |
| 05-Mar-2025 |
Steven Rostedt <[email protected]> |
tracing: Update modules to persistent instances when loaded
When a module is loaded and a persistent buffer is actively tracing, add it to the list of modules in the persistent memory.
Cc: Masami H
tracing: Update modules to persistent instances when loaded
When a module is loaded and a persistent buffer is actively tracing, add it to the list of modules in the persistent memory.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
0c588ac0 |
| 21-Mar-2025 |
Gabriele Paoloni <[email protected]> |
tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER
When __ftrace_event_enable_disable invokes the class callback to unregister the event, the return value is not rep
tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER
When __ftrace_event_enable_disable invokes the class callback to unregister the event, the return value is not reported up to the caller, hence leading to event unregister failures being silently ignored.
This patch assigns the ret variable to the invocation of the event unregister callback, so that its return value is stored and reported to the caller, and it raises a warning in case of error.
Link: https://lore.kernel.org/[email protected] Signed-off-by: Gabriele Paoloni <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4 |
|
| #
2fa6a013 |
| 20-Feb-2025 |
Adrian Huang <[email protected]> |
tracing: Fix memory leak when reading set_event file
kmemleak reports the following memory leak after reading set_event file:
# cat /sys/kernel/tracing/set_event
# cat /sys/kernel/debug/kmemle
tracing: Fix memory leak when reading set_event file
kmemleak reports the following memory leak after reading set_event file:
# cat /sys/kernel/tracing/set_event
# cat /sys/kernel/debug/kmemleak unreferenced object 0xff110001234449e0 (size 16): comm "cat", pid 13645, jiffies 4294981880 hex dump (first 16 bytes): 01 00 00 00 00 00 00 00 a8 71 e7 84 ff ff ff ff .........q...... backtrace (crc c43abbc): __kmalloc_cache_noprof+0x3ca/0x4b0 s_start+0x72/0x2d0 seq_read_iter+0x265/0x1080 seq_read+0x2c9/0x420 vfs_read+0x166/0xc30 ksys_read+0xf4/0x1d0 do_syscall_64+0x79/0x150 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The issue can be reproduced regardless of whether set_event is empty or not. Here is an example about the valid content of set_event.
# cat /sys/kernel/tracing/set_event sched:sched_process_fork sched:sched_switch sched:sched_wakeup *:*:mod:trace_events_sample
The root cause is that s_next() returns NULL when nothing is found. This results in s_stop() attempting to free a NULL pointer because its parameter is NULL.
Fix the issue by freeing the memory appropriately when s_next() fails to find anything.
Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: b355247df104 ("tracing: Cache ":mod:" events for modules not loaded yet") Signed-off-by: Adrian Huang <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
8f21943e |
| 21-Jan-2025 |
Steven Rostedt <[email protected]> |
tracing: Fix output of set_event for some cached module events
The following works fine:
~# echo ':mod:trace_events_sample' > /sys/kernel/tracing/set_event ~# cat /sys/kernel/tracing/set_event *
tracing: Fix output of set_event for some cached module events
The following works fine:
~# echo ':mod:trace_events_sample' > /sys/kernel/tracing/set_event ~# cat /sys/kernel/tracing/set_event *:*:mod:trace_events_sample ~#
But if a name is given without a ':' where it can match an event name or system name, the output of the cached events does not include a new line:
~# echo 'foo_bar:mod:trace_events_sample' > /sys/kernel/tracing/set_event ~# cat /sys/kernel/tracing/set_event foo_bar:mod:trace_events_sample~#
Add the '\n' to that as well.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: b355247df104e ("tracing: Cache ":mod:" events for modules not loaded yet") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
f95ee542 |
| 21-Jan-2025 |
Steven Rostedt <[email protected]> |
tracing: Fix allocation of printing set_event file content
The adding of cached events for modules not loaded yet required a descriptor to separate the iteration of events with the iteration of cach
tracing: Fix allocation of printing set_event file content
The adding of cached events for modules not loaded yet required a descriptor to separate the iteration of events with the iteration of cached events for a module. But the allocation used the size of the pointer and not the size of the contents to allocate its data and caused a slab-out-of-bounds.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Reported-by: Sasha Levin <[email protected]> Closes: https://lore.kernel.org/all/Z4_OHKESRSiJcr-b@lappy/ Fixes: b355247df104e ("tracing: Cache ":mod:" events for modules not loaded yet") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
22412b72 |
| 20-Jan-2025 |
Steven Rostedt <[email protected]> |
tracing: Rename update_cache() to update_mod_cache()
The static function in trace_events.c called update_cache() is too generic and conflicts with the function defined in arch/openrisc/include/asm/p
tracing: Rename update_cache() to update_mod_cache()
The static function in trace_events.c called update_cache() is too generic and conflicts with the function defined in arch/openrisc/include/asm/pgtable.h
Rename it to update_mod_cache() to make it less generic.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: b355247df104e ("tracing: Cache ":mod:" events for modules not loaded yet") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
a925df6f |
| 20-Jan-2025 |
Steven Rostedt <[email protected]> |
tracing: Fix #if CONFIG_MODULES to #ifdef CONFIG_MODULES
A typo was introduced when adding the ":mod:" command that did a "#if CONFIG_MODULES" instead of a "#ifdef CONFIG_MODULES". Fix it.
Cc: Masa
tracing: Fix #if CONFIG_MODULES to #ifdef CONFIG_MODULES
A typo was introduced when adding the ":mod:" command that did a "#if CONFIG_MODULES" instead of a "#ifdef CONFIG_MODULES". Fix it.
Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: b355247df104e ("tracing: Cache ":mod:" events for modules not loaded yet") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13 |
|
| #
b355247d |
| 16-Jan-2025 |
Steven Rostedt <[email protected]> |
tracing: Cache ":mod:" events for modules not loaded yet
When the :mod: command is written into /sys/kernel/tracing/set_event (or that file within an instance), if the module specified after the ":m
tracing: Cache ":mod:" events for modules not loaded yet
When the :mod: command is written into /sys/kernel/tracing/set_event (or that file within an instance), if the module specified after the ":mod:" is not yet loaded, it will store that string internally. When the module is loaded, it will enable the events as if the module was loaded when the string was written into the set_event file.
This can also be useful to enable events that are in the init section of the module, as the events are enabled before the init section is executed.
This also works on the kernel command line:
trace_event=:mod:<module>
Will enable the events for <module> when it is loaded.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
4c86bc53 |
| 16-Jan-2025 |
Steven Rostedt <[email protected]> |
tracing: Add :mod: command to enabled module events
Add a :mod: command to enable only events from a given module from the set_events file.
echo '*:mod:<module>' > set_events
Or
echo ':mod:<m
tracing: Add :mod: command to enabled module events
Add a :mod: command to enable only events from a given module from the set_events file.
echo '*:mod:<module>' > set_events
Or
echo ':mod:<module>' > set_events
Will enable all events for that module. Specific events can also be enabled via:
echo '<event>:mod:<module>' > set_events
Or
echo '<system>:<event>:mod:<module>' > set_events
Or
echo '*:<event>:mod:<module>' > set_events
The ":mod:" keyword is consistent with the function tracing filter to enable functions from a given module.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5 |
|
| #
1bd13edb |
| 27-Dec-2024 |
Masami Hiramatsu (Google) <[email protected]> |
tracing/hist: Add poll(POLLIN) support on hist file
Add poll syscall support on the `hist` file. The Waiter will be waken up when the histogram is updated with POLLIN.
Currently, there is no way to
tracing/hist: Add poll(POLLIN) support on hist file
Add poll syscall support on the `hist` file. The Waiter will be waken up when the histogram is updated with POLLIN.
Currently, there is no way to wait for a specific event in userspace. So user needs to peek the `trace` periodicaly, or wait on `trace_pipe`. But it is not a good idea to peek at the `trace` for an event that randomly happens. And `trace_pipe` is not coming back until a page is filled with events.
This allows a user to wait for a specific event on the `hist` file. User can set a histogram trigger on the event which they want to monitor and poll() on its `hist` file. Since this poll() returns POLLIN, the next poll() will return soon unless a read() happens on that hist file.
NOTE: To read the hist file again, you must set the file offset to 0, but just for monitoring the event, you may not need to read the histogram.
Cc: Shuah Khan <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/173527247756.464571.14236296701625509931.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
afc67176 |
| 31-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Have process_string() also allow arrays
In order to catch a common bug where a TRACE_EVENT() TP_fast_assign() assigns an address of an allocated string to the ring buffer and then reference
tracing: Have process_string() also allow arrays
In order to catch a common bug where a TRACE_EVENT() TP_fast_assign() assigns an address of an allocated string to the ring buffer and then references it in TP_printk(), which can be executed hours later when the string is free, the function test_event_printk() runs on all events as they are registered to make sure there's no unwanted dereferencing.
It calls process_string() to handle cases in TP_printk() format that has "%s". It returns whether or not the string is safe. But it can have some false positives.
For instance, xe_bo_move() has:
TP_printk("move_lacks_source:%s, migrate object %p [size %zu] from %s to %s device_id:%s", __entry->move_lacks_source ? "yes" : "no", __entry->bo, __entry->size, xe_mem_type_to_name[__entry->old_placement], xe_mem_type_to_name[__entry->new_placement], __get_str(device_id))
Where the "%s" references into xe_mem_type_to_name[]. This is an array of pointers that should be safe for the event to access. Instead of flagging this as a bad reference, if a reference points to an array, where the record field is the index, consider it safe.
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 65a25d9f7ac02 ("tracing: Add "%s" check in test_event_printk()") Reported-by: Genes Lists <[email protected]> Tested-by: Gene C <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc4 |
|
| #
59980d9b |
| 19-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Switch trace_events.c code over to use guard()
There are several functions in trace_events.c that have "goto out;" or equivalent on error in order to release locks that were taken. This can
tracing: Switch trace_events.c code over to use guard()
There are several functions in trace_events.c that have "goto out;" or equivalent on error in order to release locks that were taken. This can be error prone or just simply make the code more complex.
Switch every location that ends with unlocking a mutex on error over to using the guard(mutex)() infrastructure to let the compiler worry about releasing locks. This makes the code easier to read and understand.
Some locations did some simple arithmetic after releasing the lock. As this causes no real overhead for holding a mutex while processing the file position (*ppos += cnt;) let the lock be held over this logic too.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
4b8d63e5 |
| 19-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Simplify event_enable_func() goto_reg logic
Currently there's an "out_reg:" label that gets jumped to if there's no parameters to process. Instead, make it a proper "if (param) { }" block a
tracing: Simplify event_enable_func() goto_reg logic
Currently there's an "out_reg:" label that gets jumped to if there's no parameters to process. Instead, make it a proper "if (param) { }" block as there's not much to do for the parameter processing, and remove the "out_reg:" label.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
c949dfb9 |
| 19-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Simplify event_enable_func() goto out_free logic
The event_enable_func() function allocates the data descriptor early in the function just to assign its data->count value via:
kstrtoul(n
tracing: Simplify event_enable_func() goto out_free logic
The event_enable_func() function allocates the data descriptor early in the function just to assign its data->count value via:
kstrtoul(number, 0, &data->count);
This makes the code more complex as there are several error paths before the data descriptor is actually used. This means there needs to be a goto out_free; to clean it up.
Use a local variable "count" to do the update and move the data allocation just before it is used. This removes the "out_free" label as the data can be freed on the failure path of where it is used.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
cad1d5bd |
| 19-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Have event_enable_write() just return error on error
The event_enable_write() function is inconsistent in how it returns errors. Sometimes it updates the ppos parameter and sometimes it doe
tracing: Have event_enable_write() just return error on error
The event_enable_write() function is inconsistent in how it returns errors. Sometimes it updates the ppos parameter and sometimes it doesn't. Simplify the code to just return an error or the count if there isn't an error.
Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
afd2627f |
| 17-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Check "%s" dereference via the field and not the TP_printk format
The TP_printk() portion of a trace event is executed at the time a event is read from the trace. This can happen seconds, m
tracing: Check "%s" dereference via the field and not the TP_printk format
The TP_printk() portion of a trace event is executed at the time a event is read from the trace. This can happen seconds, minutes, hours, days, months, years possibly later since the event was recorded. If the print format contains a dereference to a string via "%s", and that string was allocated, there's a chance that string could be freed before it is read by the trace file.
To protect against such bugs, there are two functions that verify the event. The first one is test_event_printk(), which is called when the event is created. It reads the TP_printk() format as well as its arguments to make sure nothing may be dereferencing a pointer that was not copied into the ring buffer along with the event. If it is, it will trigger a WARN_ON().
For strings that use "%s", it is not so easy. The string may not reside in the ring buffer but may still be valid. Strings that are static and part of the kernel proper which will not be freed for the life of the running system, are safe to dereference. But to know if it is a pointer to a static string or to something on the heap can not be determined until the event is triggered.
This brings us to the second function that tests for the bad dereferencing of strings, trace_check_vprintf(). It would walk through the printf format looking for "%s", and when it finds it, it would validate that the pointer is safe to read. If not, it would produces a WARN_ON() as well and write into the ring buffer "[UNSAFE-MEMORY]".
The problem with this is how it used va_list to have vsnprintf() handle all the cases that it didn't need to check. Instead of re-implementing vsnprintf(), it would make a copy of the format up to the %s part, and call vsnprintf() with the current va_list ap variable, where the ap would then be ready to point at the string in question.
For architectures that passed va_list by reference this was possible. For architectures that passed it by copy it was not. A test_can_verify() function was used to differentiate between the two, and if it wasn't possible, it would disable it.
Even for architectures where this was feasible, it was a stretch to rely on such a method that is undocumented, and could cause issues later on with new optimizations of the compiler.
Instead, the first function test_event_printk() was updated to look at "%s" as well. If the "%s" argument is a pointer outside the event in the ring buffer, it would find the field type of the event that is the problem and mark the structure with a new flag called "needs_test". The event itself will be marked by TRACE_EVENT_FL_TEST_STR to let it be known that this event has a field that needs to be verified before the event can be printed using the printf format.
When the event fields are created from the field type structure, the fields would copy the field type's "needs_test" value.
Finally, before being printed, a new function ignore_event() is called which will check if the event has the TEST_STR flag set (if not, it returns false). If the flag is set, it then iterates through the events fields looking for the ones that have the "needs_test" flag set.
Then it uses the offset field from the field structure to find the pointer in the ring buffer event. It runs the tests to make sure that pointer is safe to print and if not, it triggers the WARN_ON() and also adds to the trace output that the event in question has an unsafe memory access.
The ignore_event() makes the trace_check_vprintf() obsolete so it is removed.
Link: https://lore.kernel.org/all/CAHk-=wh3uOnqnZPpR0PeLZZtyWbZLboZ7cHLCKRWsocvs9Y7hQ@mail.gmail.com/
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
65a25d9f |
| 17-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Add "%s" check in test_event_printk()
The test_event_printk() code makes sure that when a trace event is registered, any dereferenced pointers in from the event's TP_printk() are pointing t
tracing: Add "%s" check in test_event_printk()
The test_event_printk() code makes sure that when a trace event is registered, any dereferenced pointers in from the event's TP_printk() are pointing to content in the ring buffer. But currently it does not handle "%s", as there's cases where the string pointer saved in the ring buffer points to a static string in the kernel that will never be freed. As that is a valid case, the pointer needs to be checked at runtime.
Currently the runtime check is done via trace_check_vprintf(), but to not have to replicate everything in vsnprintf() it does some logic with the va_list that may not be reliable across architectures. In order to get rid of that logic, more work in the test_event_printk() needs to be done. Some of the strings can be validated at this time when it is obvious the string is valid because the string will be saved in the ring buffer content.
Do all the validation of strings in the ring buffer at boot in test_event_printk(), and make sure that the field of the strings that point into the kernel are accessible. This will allow adding checks at runtime that will validate the fields themselves and not rely on paring the TP_printk() format at runtime.
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
91711048 |
| 17-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Add missing helper functions in event pointer dereference check
The process_pointer() helper function looks to see if various trace event macros are used. These macros are for storing data
tracing: Add missing helper functions in event pointer dereference check
The process_pointer() helper function looks to see if various trace event macros are used. These macros are for storing data in the event. This makes it safe to dereference as the dereference will then point into the event on the ring buffer where the content of the data stays with the event itself.
A few helper functions were missing. Those were:
__get_rel_dynamic_array() __get_dynamic_array_len() __get_rel_dynamic_array_len() __get_rel_sockaddr()
Also add a helper function find_print_string() to not need to use a middle man variable to test if the string exists.
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
a6629626 |
| 17-Dec-2024 |
Steven Rostedt <[email protected]> |
tracing: Fix test_event_printk() to process entire print argument
The test_event_printk() analyzes print formats of trace events looking for cases where it may dereference a pointer that is not in t
tracing: Fix test_event_printk() to process entire print argument
The test_event_printk() analyzes print formats of trace events looking for cases where it may dereference a pointer that is not in the ring buffer which can possibly be a bug when the trace event is read from the ring buffer and the content of that pointer no longer exists.
The function needs to accurately go from one print format argument to the next. It handles quotes and parenthesis that may be included in an argument. When it finds the start of the next argument, it uses a simple "c = strstr(fmt + i, ',')" to find the end of that argument!
In order to include "%s" dereferencing, it needs to process the entire content of the print format argument and not just the content of the first ',' it finds. As there may be content like:
({ const char *saved_ptr = trace_seq_buffer_ptr(p); static const char *access_str[] = { "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux" }; union kvm_mmu_page_role role; role.word = REC->role; trace_seq_printf(p, "sp gen %u gfn %llx l%u %u-byte q%u%s %s%s" " %snxe %sad root %u %s%c", REC->mmu_valid_gen, REC->gfn, role.level, role.has_4_byte_gpte ? 4 : 8, role.quadrant, role.direct ? " direct" : "", access_str[role.access], role.invalid ? " invalid" : "", role.efer_nx ? "" : "!", role.ad_disabled ? "!" : "", REC->root_count, REC->unsync ? "unsync" : "sync", 0); saved_ptr; })
Which is an example of a full argument of an existing event. As the code already handles finding the next print format argument, process the argument at the end of it and not the start of it. This way it has both the start of the argument as well as the end of it.
Add a helper function "process_pointer()" that will do the processing during the loop as well as at the end. It also makes the code cleaner and easier to read.
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Linus Torvalds <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11 |
|
| #
49e4154f |
| 11-Sep-2024 |
Zheng Yejian <[email protected]> |
tracing: Remove TRACE_EVENT_FL_FILTERED logic
After commit dcb0b5575d24 ("tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic"), no one's going to set the TRACE_EVENT_FL_FILTERED or change the cal
tracing: Remove TRACE_EVENT_FL_FILTERED logic
After commit dcb0b5575d24 ("tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic"), no one's going to set the TRACE_EVENT_FL_FILTERED or change the call->filter, so remove related logic.
Link: https://lore.kernel.org/[email protected] Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
6e2fdcef |
| 26-Jul-2024 |
Steven Rostedt <[email protected]> |
tracing: Use refcount for trace_event_file reference counter
Instead of using an atomic counter for the trace_event_file reference counter, use the refcount interface. It has various checks to make
tracing: Use refcount for trace_event_file reference counter
Instead of using an atomic counter for the trace_event_file reference counter, use the refcount interface. It has various checks to make sure the reference counting is correct, and will warn if it detects an error (like refcount_inc() on '0').
Cc: Mathieu Desnoyers <[email protected]> Link: https://lore.kernel.org/[email protected] Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
| #
b1560408 |
| 30-Jul-2024 |
Steven Rostedt <[email protected]> |
tracing: Have format file honor EVENT_FILE_FL_FREED
When eventfs was introduced, special care had to be done to coordinate the freeing of the file meta data with the files that are exposed to user s
tracing: Have format file honor EVENT_FILE_FL_FREED
When eventfs was introduced, special care had to be done to coordinate the freeing of the file meta data with the files that are exposed to user space. The file meta data would have a ref count that is set when the file is created and would be decremented and freed after the last user that opened the file closed it. When the file meta data was to be freed, it would set a flag (EVENT_FILE_FL_FREED) to denote that the file is freed, and any new references made (like new opens or reads) would fail as it is marked freed. This allowed other meta data to be freed after this flag was set (under the event_mutex).
All the files that were dynamically created in the events directory had a pointer to the file meta data and would call event_release() when the last reference to the user space file was closed. This would be the time that it is safe to free the file meta data.
A shortcut was made for the "format" file. It's i_private would point to the "call" entry directly and not point to the file's meta data. This is because all format files are the same for the same "call", so it was thought there was no reason to differentiate them. The other files maintain state (like the "enable", "trigger", etc). But this meant if the file were to disappear, the "format" file would be unaware of it.
This caused a race that could be trigger via the user_events test (that would create dynamic events and free them), and running a loop that would read the user_events format files:
In one console run:
# cd tools/testing/selftests/user_events # while true; do ./ftrace_test; done
And in another console run:
# cd /sys/kernel/tracing/ # while true; do cat events/user_events/__test_event/format; done 2>/dev/null
With KASAN memory checking, it would trigger a use-after-free bug report (which was a real bug). This was because the format file was not checking the file's meta data flag "EVENT_FILE_FL_FREED", so it would access the event that the file meta data pointed to after the event was freed.
After inspection, there are other locations that were found to not check the EVENT_FILE_FL_FREED flag when accessing the trace_event_file. Add a new helper function: event_file_file() that will make sure that the event_mutex is held, and will return NULL if the trace_event_file has the EVENT_FILE_FL_FREED flag set. Have the first reference of the struct file pointer use event_file_file() and check for NULL. Later uses can still use the event_file_data() helper function if the event_mutex is still held and was not released since the event_file_file() call.
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Ajay Kaher <[email protected]> Cc: Ilkka Naulapää <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Beau Belgrave <[email protected]> Cc: Florian Fainelli <[email protected]> Cc: Alexey Makhalov <[email protected]> Cc: Vasavi Sirnapalli <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: b63db58e2fa5d ("eventfs/tracing: Add callback for release of an eventfs_inode") Reported-by: Mathias Krause <[email protected]> Tested-by: Mathias Krause <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
b63db58e |
| 02-May-2024 |
Steven Rostedt (Google) <[email protected]> |
eventfs/tracing: Add callback for release of an eventfs_inode
Synthetic events create and destroy tracefs files when they are created and removed. The tracing subsystem has its own file descriptor r
eventfs/tracing: Add callback for release of an eventfs_inode
Synthetic events create and destroy tracefs files when they are created and removed. The tracing subsystem has its own file descriptor representing the state of the events attached to the tracefs files. There's a race between the eventfs files and this file descriptor of the tracing system where the following can cause an issue:
With two scripts 'A' and 'B' doing:
Script 'A': echo "hello int aaa" > /sys/kernel/tracing/synthetic_events while : do echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable done
Script 'B': echo > /sys/kernel/tracing/synthetic_events
Script 'A' creates a synthetic event "hello" and then just writes zero into its enable file.
Script 'B' removes all synthetic events (including the newly created "hello" event).
What happens is that the opening of the "enable" file has:
{ struct trace_event_file *file = inode->i_private; int ret;
ret = tracing_check_open_get_tr(file->tr); [..]
But deleting the events frees the "file" descriptor, and a "use after free" happens with the dereference at "file->tr".
The file descriptor does have a reference counter, but there needs to be a way to decrement it from the eventfs when the eventfs_inode is removed that represents this file descriptor.
Add an optional "release" callback to the eventfs_entry array structure, that gets called when the eventfs file is about to be removed. This allows for the creating on the eventfs file to increment the tracing file descriptor ref counter. When the eventfs file is deleted, it can call the release function that will call the put function for the tracing file descriptor.
This will protect the tracing file from being freed while a eventfs file that references it is being opened.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by: Tze-nan wu <[email protected]> Tested-by: Tze-nan Wu (吳澤南) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3 |
|
| #
5281ec83 |
| 03-Apr-2024 |
Arnd Bergmann <[email protected]> |
tracing: hide unused ftrace_event_id_fops
When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the unused ftrace_event_id_fops variable:
kernel/trace/trace_events.c:2155:37: error:
tracing: hide unused ftrace_event_id_fops
When CONFIG_PERF_EVENTS, a 'make W=1' build produces a warning about the unused ftrace_event_id_fops variable:
kernel/trace/trace_events.c:2155:37: error: 'ftrace_event_id_fops' defined but not used [-Werror=unused-const-variable=] 2155 | static const struct file_operations ftrace_event_id_fops = {
Hide this in the same #ifdef as the reference to it.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Kees Cook <[email protected]> Cc: Ajay Kaher <[email protected]> Cc: Jinjie Ruan <[email protected]> Cc: Clément Léger <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: "Tzvetomir Stoyanov (VMware)" <[email protected]> Fixes: 620a30e97feb ("tracing: Don't pass file_operations array to event_create_dir()") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|