| e59db062 | 17-Oct-2024 |
Andrea Parri <[email protected]> |
riscv, bpf: Make BPF_CMPXCHG fully ordered
According to the prototype formal BPF memory consistency model discussed e.g. in [1] and following the ordering properties of the C/in-kernel macro atomic_
riscv, bpf: Make BPF_CMPXCHG fully ordered
According to the prototype formal BPF memory consistency model discussed e.g. in [1] and following the ordering properties of the C/in-kernel macro atomic_cmpxchg(), a BPF atomic operation with the BPF_CMPXCHG modifier is fully ordered. However, the current RISC-V JIT lowerings fail to meet such memory ordering property. This is illustrated by the following litmus test:
BPF BPF__MP+success_cmpxchg+fence { 0:r1=x; 0:r3=y; 0:r5=1; 1:r2=y; 1:r4=f; 1:r7=x; } P0 | P1 ; *(u64 *)(r1 + 0) = 1 | r1 = *(u64 *)(r2 + 0) ; r2 = cmpxchg_64 (r3 + 0, r4, r5) | r3 = atomic_fetch_add((u64 *)(r4 + 0), r5) ; | r6 = *(u64 *)(r7 + 0) ; exists (1:r1=1 /\ 1:r6=0)
whose "exists" clause is not satisfiable according to the BPF memory model. Using the current RISC-V JIT lowerings, the test can be mapped to the following RISC-V litmus test:
RISCV RISCV__MP+success_cmpxchg+fence { 0:x1=x; 0:x3=y; 0:x5=1; 1:x2=y; 1:x4=f; 1:x7=x; } P0 | P1 ; sd x5, 0(x1) | ld x1, 0(x2) ; L00: | amoadd.d.aqrl x3, x5, 0(x4) ; lr.d x2, 0(x3) | ld x6, 0(x7) ; bne x2, x4, L01 | ; sc.d x6, x5, 0(x3) | ; bne x6, x4, L00 | ; fence rw, rw | ; L01: | ; exists (1:x1=1 /\ 1:x6=0)
where the two stores in P0 can be reordered. Update the RISC-V JIT lowerings/implementation of BPF_CMPXCHG to emit an SC with RELEASE ("rl") annotation in order to meet the expected memory ordering guarantees. The resulting RISC-V JIT lowerings of BPF_CMPXCHG match the RISC-V lowerings of the C atomic_cmpxchg().
Other lowerings were fixed via 20a759df3bba ("riscv, bpf: make some atomic operations fully ordered").
Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Andrea Parri <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Puranjay Mohan <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lpc.events/event/18/contributions/1949/attachments/1665/3441/bpfmemmodel.2024.09.19p.pdf [1] Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| 6801b0ae | 02-Jul-2024 |
Pu Lehui <[email protected]> |
riscv, bpf: Add 12-argument support for RV64 bpf trampoline
This patch adds 12 function arguments support for riscv64 bpf trampoline. The current bpf trampoline supports <= sizeof(u64) bytes scalar
riscv, bpf: Add 12-argument support for RV64 bpf trampoline
This patch adds 12 function arguments support for riscv64 bpf trampoline. The current bpf trampoline supports <= sizeof(u64) bytes scalar arguments [0] and <= 16 bytes struct arguments [1]. Therefore, we focus on the situation where scalars are at most XLEN bits and aggregates whose total size does not exceed 2×XLEN bits in the riscv calling convention [2].
Signed-off-by: Pu Lehui <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Acked-by: Puranjay Mohan <[email protected]> Link: https://elixir.bootlin.com/linux/v6.8/source/kernel/bpf/btf.c#L6184 [0] Link: https://elixir.bootlin.com/linux/v6.8/source/kernel/bpf/btf.c#L6769 [1] Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/draft-20230929-e5c800e661a53efe3c2678d71a306323b60eb13b/riscv-abi.pdf [2] Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| 2382a405 | 22-Jun-2024 |
Pu Lehui <[email protected]> |
riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline
We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. We can apply it to bpf trampoline, as
riscv, bpf: Use bpf_prog_pack for RV64 bpf trampoline
We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. We can apply it to bpf trampoline, as Song had been implemented it in core and x86 [0]. This patch is going to use bpf_prog_pack to RV64 bpf trampoline. Since Song and Puranjay have done a lot of work for bpf_prog_pack on RV64, implementing this function will be easy.
Signed-off-by: Pu Lehui <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Björn Töpel <[email protected]> #riscv Link: https://lore.kernel.org/all/[email protected] [0] Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| 99fa63d9 | 19-May-2024 |
Xiao Wang <[email protected]> |
riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
We could try to emit compressed insn for reg move operation during CMPXCHG JIT, the instruction compression has no impact on the jump offsets
riscv, bpf: Try RVC for reg move within BPF_CMPXCHG JIT
We could try to emit compressed insn for reg move operation during CMPXCHG JIT, the instruction compression has no impact on the jump offsets of following forward and backward jump instructions.
Signed-off-by: Xiao Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| e944fc81 | 23-May-2024 |
Xiao Wang <[email protected]> |
riscv, bpf: Use STACK_ALIGN macro for size rounding up
Use the macro STACK_ALIGN that is defined in asm/processor.h for stack size rounding up, just like bpf_jit_comp32.c does.
Signed-off-by: Xiao
riscv, bpf: Use STACK_ALIGN macro for size rounding up
Use the macro STACK_ALIGN that is defined in asm/processor.h for stack size rounding up, just like bpf_jit_comp32.c does.
Signed-off-by: Xiao Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Pu Lehui <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| 20a759df | 05-May-2024 |
Puranjay Mohan <[email protected]> |
riscv, bpf: make some atomic operations fully ordered
The BPF atomic operations with the BPF_FETCH modifier along with BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements all at
riscv, bpf: make some atomic operations fully ordered
The BPF atomic operations with the BPF_FETCH modifier along with BPF_XCHG and BPF_CMPXCHG are fully ordered but the RISC-V JIT implements all atomic operations except BPF_CMPXCHG with relaxed ordering.
Section 8.1 of the "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA" [1], titled, "Specifying Ordering of Atomic Instructions" says:
| To provide more efficient support for release consistency [5], each | atomic instruction has two bits, aq and rl, used to specify additional | memory ordering constraints as viewed by other RISC-V harts.
and
| If only the aq bit is set, the atomic memory operation is treated as | an acquire access. | If only the rl bit is set, the atomic memory operation is treated as a | release access. | | If both the aq and rl bits are set, the atomic memory operation is | sequentially consistent.
Fix this by setting both aq and rl bits as 1 for operations with BPF_FETCH and BPF_XCHG.
[1] https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf
Fixes: dd642ccb45ec ("riscv, bpf: Implement more atomic operations for RV64") Signed-off-by: Puranjay Mohan <[email protected]> Reviewed-by: Pu Lehui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| 2ddec2c8 | 02-May-2024 |
Puranjay Mohan <[email protected]> |
riscv, bpf: inline bpf_get_smp_processor_id()
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.
RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) regist
riscv, bpf: inline bpf_get_smp_processor_id()
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.
RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu).
RISCV64 JIT output for `call bpf_get_smp_processor_id` ======================================================
Before After -------- -------
auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0
Benchmark using [1] on Qemu.
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+
NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn.
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
show more ...
|
| 21ab0b6d | 04-Apr-2024 |
Puranjay Mohan <[email protected]> |
bpf, riscv: Implement bpf_addr_space_cast instruction
LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N
bpf, riscv: Implement bpf_addr_space_cast instruction
LLVM generates bpf_addr_space_cast instruction while translating pointers between native (zero) address space and __attribute__((address_space(N))). The addr_space=0 is reserved as bpf_arena address space.
rY = addr_space_cast(rX, 0, 1) is processed by the verifier and converted to normal 32-bit move: wX = wY
rY = addr_space_cast(rX, 1, 0) has to be converted by JIT.
Signed-off-by: Puranjay Mohan <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Björn Töpel <[email protected]> Tested-by: Pu Lehui <[email protected]> Reviewed-by: Pu Lehui <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|
| 06a33d02 | 15-Jan-2024 |
Pu Lehui <[email protected]> |
riscv, bpf: Optimize bswap insns with Zbb support
Optimize bswap instructions by rev8 Zbb instruction conbined with srli instruction. And Optimize 16-bit zero-extension with Zbb support.
Signed-off
riscv, bpf: Optimize bswap insns with Zbb support
Optimize bswap instructions by rev8 Zbb instruction conbined with srli instruction. And Optimize 16-bit zero-extension with Zbb support.
Signed-off-by: Pu Lehui <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Björn Töpel <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
show more ...
|