History log of /linux-6.15/arch/x86/kernel/static_call.c (Results 1 – 23 of 23)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5
# a75bf27f 22-Jun-2024 Pawan Gupta <[email protected]>

x86/its: Add support for ITS-safe return thunk

RETs in the lower half of cacheline may be affected by ITS bug,
specifically when the RSB-underflows. Use ITS-safe return thunk for such
RETs.

RETs th

x86/its: Add support for ITS-safe return thunk

RETs in the lower half of cacheline may be affected by ITS bug,
specifically when the RSB-underflows. Use ITS-safe return thunk for such
RETs.

RETs that are not patched:

- RET in retpoline sequence does not need to be patched, because the
sequence itself fills an RSB before RET.
- RET in Call Depth Tracking (CDT) thunks __x86_indirect_{call|jump}_thunk
and call_depth_return_thunk are not patched because CDT by design
prevents RSB-underflow.
- RETs in .init section are not reachable after init.
- RETs that are explicitly marked safe with ANNOTATE_UNRET_SAFE.

Signed-off-by: Pawan Gupta <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Alexandre Chartre <[email protected]>

show more ...


# d856bc3a 03-Dec-2024 Christophe Leroy <[email protected]>

static_call_inline: Provide trampoline address when updating sites

In preparation of support of inline static calls on powerpc, provide
trampoline address when updating sites, so that when the desti

static_call_inline: Provide trampoline address when updating sites

In preparation of support of inline static calls on powerpc, provide
trampoline address when updating sites, so that when the destination
function is too far for a direct function call, the call site is
patched with a call to the trampoline.

Signed-off-by: Christophe Leroy <[email protected]>
Signed-off-by: Madhavan Srinivasan <[email protected]>
Link: https://patch.msgid.link/5efe0cffc38d6f69b1ec13988a99f1acff551abf.1733245362.git.christophe.leroy@csgroup.eu

show more ...


# 5cc2db37 21-Dec-2024 Andrew Cooper <[email protected]>

x86/static-call: Remove early_boot_irqs_disabled check to fix Xen PVH dom0

__static_call_update_early() has a check for early_boot_irqs_disabled, but
is used before early_boot_irqs_disabled is set u

x86/static-call: Remove early_boot_irqs_disabled check to fix Xen PVH dom0

__static_call_update_early() has a check for early_boot_irqs_disabled, but
is used before early_boot_irqs_disabled is set up in start_kernel().

Xen PV has always special cased early_boot_irqs_disabled, but Xen PVH does
not and falls over the BUG when booting as dom0.

It is very suspect that early_boot_irqs_disabled starts as 0, becomes 1 for
a time, then becomes 0 again, but as this needs backporting to fix a
breakage in a security fix, dropping the BUG_ON() is the far safer option.

Fixes: 0ef8047b737d ("x86/static-call: provide a way to do very early static-call updates")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219620
Reported-by: Alex Zenla <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Cooper <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Alex Zenla <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 0ef8047b 29-Nov-2024 Juergen Gross <[email protected]>

x86/static-call: provide a way to do very early static-call updates

Add static_call_update_early() for updating static-call targets in
very early boot.

This will be needed for support of Xen guest

x86/static-call: provide a way to do very early static-call updates

Add static_call_update_early() for updating static-call targets in
very early boot.

This will be needed for support of Xen guest type specific hypercall
functions.

This is part of XSA-466 / CVE-2024-53241.

Reported-by: Andrew Cooper <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Co-developed-by: Peter Zijlstra <[email protected]>
Co-developed-by: Josh Poimboeuf <[email protected]>

show more ...


Revision tags: v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3
# 0911b8c5 21-Nov-2023 Breno Leitao <[email protected]>

x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK

Step 10/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Added one more case. ]

Sugges

x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK

Step 10/10 of the namespace unification of CPU mitigations related Kconfig options.

[ mingo: Added one more case. ]

Suggested-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7
# 54097309 16-Aug-2023 Peter Zijlstra <[email protected]>

x86/static_call: Fix __static_call_fixup()

Christian reported spurious module load crashes after some of Song's
module memory layout patches.

Turns out that if the very last instruction on the very

x86/static_call: Fix __static_call_fixup()

Christian reported spurious module load crashes after some of Song's
module memory layout patches.

Turns out that if the very last instruction on the very last page of the
module is a 'JMP __x86_return_thunk' then __static_call_fixup() will
trip a fault and die.

And while the module rework made this slightly more likely to happen,
it's always been possible.

Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding")
Reported-by: Christian Bricart <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6
# 923510c8 26-Jan-2023 Peter Zijlstra <[email protected]>

x86/static_call: Add support for Jcc tail-calls

Clang likes to create conditional tail calls like:

0000000000000350 <amd_pmu_add_event>:
350: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,

x86/static_call: Add support for Jcc tail-calls

Clang likes to create conditional tail calls like:

0000000000000350 <amd_pmu_add_event>:
350: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 351: R_X86_64_NONE __fentry__-0x4
355: 48 83 bf 20 01 00 00 00 cmpq $0x0,0x120(%rdi)
35d: 0f 85 00 00 00 00 jne 363 <amd_pmu_add_event+0x13> 35f: R_X86_64_PLT32 __SCT__amd_pmu_branch_add-0x4
363: e9 00 00 00 00 jmp 368 <amd_pmu_add_event+0x18> 364: R_X86_64_PLT32 __x86_return_thunk-0x4

Where 0x35d is a static call site that's turned into a conditional
tail-call using the Jcc class of instructions.

Teach the in-line static call text patching about this.

Notably, since there is no conditional-ret, in that case patch the Jcc
to point at an empty stub function that does the ret -- or the return
thunk when needed.

Reported-by: "Erhard F." <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Masami Hiramatsu (Google) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6
# 7825451f 15-Sep-2022 Peter Zijlstra <[email protected]>

static_call: Add call depth tracking support

When indirect calls are switched to direct calls then it has to be ensured
that the call target is not the function, but the call thunk when call
depth t

static_call: Add call depth tracking support

When indirect calls are switched to direct calls then it has to be ensured
that the call target is not the function, but the call thunk when call
depth tracking is enabled. But static calls are available before call
thunks have been set up.

Ensure a second run through the static call patching code after call thunks
have been created. When call thunks are not enabled this has no side
effects.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 770ae1b7 15-Sep-2022 Peter Zijlstra <[email protected]>

x86/returnthunk: Allow different return thunks

In preparation for call depth tracking on Intel SKL CPUs, make it possible
to patch in a SKL specific return thunk.

Signed-off-by: Peter Zijlstra (Int

x86/returnthunk: Allow different return thunks

In preparation for call depth tracking on Intel SKL CPUs, make it possible
to patch in a SKL specific return thunk.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7
# c27c753e 12-Jul-2022 Thomas Gleixner <[email protected]>

x86/static_call: Serialize __static_call_fixup() properly

__static_call_fixup() invokes __static_call_transform() without holding
text_mutex, which causes lockdep to complain in text_poke_bp().

Add

x86/static_call: Serialize __static_call_fixup() properly

__static_call_fixup() invokes __static_call_transform() without holding
text_mutex, which causes lockdep to complain in text_poke_bp().

Adding the proper locking cures that, but as this is either used during
early boot or during module finalizing, it's not required to use
text_poke_bp(). Add an argument to __static_call_transform() which tells
it to use text_poke_early() for it.

Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding")
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


Revision tags: v5.19-rc6, v5.19-rc5
# f43b9876 27-Jun-2022 Peter Zijlstra <[email protected]>

x86/retbleed: Add fine grained Kconfig knobs

Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitig

x86/retbleed: Add fine grained Kconfig knobs

Do fine-grained Kconfig for all the various retbleed parts.

NOTE: if your compiler doesn't support return thunks this will
silently 'upgrade' your mitigation to IBPB, you might not like this.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


Revision tags: v5.19-rc4, v5.19-rc3
# ee88d363 14-Jun-2022 Peter Zijlstra <[email protected]>

x86,static_call: Use alternative RET encoding

In addition to teaching static_call about the new way to spell 'RET',
there is an added complication in that static_call() is allowed to
rewrite text be

x86,static_call: Use alternative RET encoding

In addition to teaching static_call about the new way to spell 'RET',
there is an added complication in that static_call() is allowed to
rewrite text before it is known which particular spelling is required.

In order to deal with this; have a static_call specific fixup in the
apply_return() 'alternative' patching routine that will rewrite the
static_call trampoline to match the definite sequence.

This in turn creates the problem of uniquely identifying static call
trampolines. Currently trampolines are 8 bytes, the first 5 being the
jmp.d32/ret sequence and the final 3 a byte sequence that spells out
'SCT'.

This sequence is used in __static_call_validate() to ensure it is
patching a trampoline and not a random other jmp.d32. That is,
false-positives shouldn't be plenty, but aren't a big concern.

OTOH the new __static_call_fixup() must not have false-positives, and
'SCT' decodes to the somewhat weird but semi plausible sequence:

push %rbx
rex.XB push %r12

Additionally, there are SLS concerns with immediate jumps. Combined it
seems like a good moment to change the signature to a single 3 byte
trap instruction that is unique to this usage and will not ever get
generated by accident.

As such, change the signature to: '0x0f, 0xb9, 0xcc', which decodes
to:

ud1 %esp, %ecx

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>

show more ...


Revision tags: v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17
# 1cd5f059 18-Mar-2022 Peter Zijlstra <[email protected]>

x86,static_call: Fix __static_call_return0 for i386

Paolo reported that the instruction sequence that is used to replace:

call __static_call_return0

namely:

66 66 48 31 c0 data16 data16 x

x86,static_call: Fix __static_call_return0 for i386

Paolo reported that the instruction sequence that is used to replace:

call __static_call_return0

namely:

66 66 48 31 c0 data16 data16 xor %rax,%rax

decodes to something else on i386, namely:

66 66 48 data16 dec %ax
31 c0 xor %eax,%eax

Which is a nonsensical sequence that happens to have the same outcome.
*However* an important distinction is that it consists of 2
instructions which is a problem when the thing needs to be overwriten
with a regular call instruction again.

As such, replace the instruction with something that decodes the same
on both i386 and x86_64.

Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()")
Reported-by: Paolo Bonzini <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


Revision tags: v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4
# e463a09a 04-Dec-2021 Peter Zijlstra <[email protected]>

x86: Add straight-line-speculation mitigation

Make use of an upcoming GCC feature to mitigate
straight-line-speculation for x86:

https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3
h

x86: Add straight-line-speculation mitigation

Make use of an upcoming GCC feature to mitigate
straight-line-speculation for x86:

https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
https://bugs.llvm.org/show_bug.cgi?id=52323

It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11.

Maintenance overhead of this should be fairly low due to objtool
validation.

Size overhead of all these additional int3 instructions comes to:

text data bss dec hex filename
22267751 6933356 2011368 31212475 1dc43bb defconfig-build/vmlinux
22804126 6933356 1470696 31208178 1dc32f2 defconfig-build/vmlinux.sls

Or roughly 2.4% additional text.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15
# 2105a927 30-Oct-2021 Peter Zijlstra <[email protected]>

static_call,x86: Robustify trampoline patching

Add a few signature bytes after the static call trampoline and verify
those bytes match before patching the trampoline. This avoids patching
random oth

static_call,x86: Robustify trampoline patching

Add a few signature bytes after the static call trampoline and verify
those bytes match before patching the trampoline. This avoids patching
random other JMPs (such as CFI jump-table entries) instead.

These bytes decode as:

d: 53 push %rbx
e: 43 54 rex.XB push %r12

And happen to spell "SCT".

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


Revision tags: v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3
# a89dfde3 12-Mar-2021 Peter Zijlstra <[email protected]>

x86: Remove dynamic NOP selection

This ensures that a NOP is a NOP and not a random other instruction that
is also a NOP. It allows simplification of dynamic code patching that
wants to verify exist

x86: Remove dynamic NOP selection

This ensures that a NOP is a NOP and not a random other instruction that
is also a NOP. It allows simplification of dynamic code patching that
wants to verify existing code before writing new instructions (ftrace,
jump_label, static_call, etc..).

Differentiating on NOPs is not a feature.

This pessimises 32bit (DONTCARE) and 32bit on 64bit CPUs (CARELESS).
32bit is not a performance target.

Everything x86_64 since AMD K10 (2007) and Intel IvyBridge (2012) is
fine with using NOPL (as opposed to prefix NOP). And per FEATURE_NOPL
being required for x86_64, all x86_64 CPUs can use NOPL. So stop
caring about NOPs, simplify things and get on with life.

[ The problem seems to be that some uarchs can only decode NOPL on a
single front-end port while others have severe decode penalties for
excessive prefixes. All modern uarchs can handle both, except Atom,
which has prefix penalties. ]

[ Also, much doubt you can actually measure any of this on normal
workloads. ]

After this, FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI.

[ bp: Kernel build measurements showed ~0.3s slowdown on Sandybridge
which is hardly a slowdown. Get rid of X86_FEATURE_K7, while at it. ]

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]> # bpf
Acked-by: Linus Torvalds <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


Revision tags: v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5
# 3f2a8fc4 18-Jan-2021 Peter Zijlstra <[email protected]>

static_call/x86: Add __static_call_return0()

Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, th

static_call/x86: Add __static_call_return0()

Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, the return value register.

The function can be cast to any function pointer type that has a
single %RAX return (including pointers). Also provide a version that
returns an int for convenience. We are clearing the entire %RAX register
in any case, whether the return value is 32 or 64 bits, since %RAX is
always a scratch register anyway.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]

show more ...


Revision tags: v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2
# a945c834 18-Aug-2020 Peter Zijlstra <[email protected]>

static_call: Allow early init

In order to use static_call() to wire up x86_pmu, we need to
initialize earlier, specifically before memory allocation works; copy
some of the tricks from jump_label to

static_call: Allow early init

In order to use static_call() to wire up x86_pmu, we need to
initialize earlier, specifically before memory allocation works; copy
some of the tricks from jump_label to enable this.

Primarily we overload key->next to store a sites pointer when there
are no modules, this avoids having to use kmalloc() to initialize the
sites and allows us to run much earlier.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 6c3fce79 18-Aug-2020 Peter Zijlstra <[email protected]>

static_call: Add some validation

Verify the text we're about to change is as we expect it to be.

Requested-by: Steven Rostedt <[email protected]>

Signed-off-by: Peter Zijlstra (Intel) <peterz@in

static_call: Add some validation

Verify the text we're about to change is as we expect it to be.

Requested-by: Steven Rostedt <[email protected]>

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 5b06fd3b 18-Aug-2020 Peter Zijlstra <[email protected]>

static_call: Handle tail-calls

GCC can turn our static_call(name)(args...) into a tail call, in which
case we get a JMP.d32 into the trampoline (which then does a further
tail-call).

Teach objtool

static_call: Handle tail-calls

GCC can turn our static_call(name)(args...) into a tail call, in which
case we get a JMP.d32 into the trampoline (which then does a further
tail-call).

Teach objtool to recognise and mark these in .static_call_sites and
adjust the code patching to deal with this.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 452cddbf 18-Aug-2020 Peter Zijlstra <[email protected]>

static_call: Add static_call_cond()

Extend the static_call infrastructure to optimize the following common
pattern:

if (func_ptr)
func_ptr(args...)

For the trampoline (which is in effect a tail

static_call: Add static_call_cond()

Extend the static_call infrastructure to optimize the following common
pattern:

if (func_ptr)
func_ptr(args...)

For the trampoline (which is in effect a tail-call), we patch the
JMP.d32 into a RET, which then directly consumes the trampoline call.

For the in-line sites we replace the CALL with a NOP5.

NOTE: this is 'obviously' limited to functions with a 'void' return type.

NOTE: DEFINE_STATIC_COND_CALL() only requires a typename, as opposed
to a full function.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# 1e7e4788 18-Aug-2020 Josh Poimboeuf <[email protected]>

x86/static_call: Add inline static call implementation for x86-64

Add the inline static call implementation for x86-64. The generated code
is identical to the out-of-line case, except we move the tr

x86/static_call: Add inline static call implementation for x86-64

Add the inline static call implementation for x86-64. The generated code
is identical to the out-of-line case, except we move the trampoline into
it's own section.

Objtool uses the trampoline naming convention to detect all the call
sites. It then annotates those call sites in the .static_call_sites
section.

During boot (and module init), the call sites are patched to call
directly into the destination function. The temporary trampoline is
then no longer used.

[peterz: merged trampolines, put trampoline in section]

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# e6d6c071 18-Aug-2020 Josh Poimboeuf <[email protected]>

x86/static_call: Add out-of-line static call implementation

Add the x86 out-of-line static call implementation. For each key, a
permanent trampoline is created which is the destination for all stat

x86/static_call: Add out-of-line static call implementation

Add the x86 out-of-line static call implementation. For each key, a
permanent trampoline is created which is the destination for all static
calls for the given key. The trampoline has a direct jump which gets
patched by static_call_update() when the destination function changes.

[peterz: fixed trampoline, rewrote patching code]

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...