| 88a580eb | 31-Jan-2021 |
Mateusz Guzik <[email protected]> |
amd64: move memcmp checks upfront
This is a tradeoff which saves jumps for smaller sizes while making the 8-16 range slower (roughly in line with the other cases).
Tested with glibc test suite.
Fo
amd64: move memcmp checks upfront
This is a tradeoff which saves jumps for smaller sizes while making the 8-16 range slower (roughly in line with the other cases).
Tested with glibc test suite.
For example size 3 (most common with vfs namecache) (ops/s): before: 407086026 after: 461391995
The regressed range of 8-16 (with 8 as example): before: 540850489 after: 461671032
(cherry picked from commit f1be262ec11c1c35e6485f432415b5b52adb505d)
show more ...
|
| 088ac3ef | 16-Nov-2018 |
Mateusz Guzik <[email protected]> |
amd64: handle small memset buffers with overlapping stores
Instead of jumping to locations which store the exact number of bytes, use displacement to move the destination.
In particular the followi
amd64: handle small memset buffers with overlapping stores
Instead of jumping to locations which store the exact number of bytes, use displacement to move the destination.
In particular the following clears an area between 8-16 (inclusive) branch-free:
movq %r10,(%rdi) movq %r10,-8(%rdi,%rcx)
For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2. Writing 8 bytes starting at that offset overlaps with 6 bytes written previously and writes 2 new, giving 10 in total.
Provides a nice win for smaller stores. Other ones are erratic depending on the microarchitecture.
General idea taken from NetBSD (restricted use of the trick) and bionic string functions (use for various ranges like in this patch).
Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17660
show more ...
|
| ad2ff705 | 15-Nov-2018 |
Mateusz Guzik <[email protected]> |
amd64: sync up libc memset with the kernel version
- tidy up memset to have rax set earlier for small sizes - finish the tail in memset with an overlapping store - align memset buffers to 16 bytes b
amd64: sync up libc memset with the kernel version
- tidy up memset to have rax set earlier for small sizes - finish the tail in memset with an overlapping store - align memset buffers to 16 bytes before using rep stos
Sponsored by: The FreeBSD Foundation
show more ...
|
| 9c7d70ee | 13-Oct-2018 |
Mateusz Guzik <[email protected]> |
amd64: convert libc bcopy to a C func to avoid future bloat
The function is of limited use and is an almost a direct clone of memmove/memcpy (with arguments swapped). Introduction of ERMS variants o
amd64: convert libc bcopy to a C func to avoid future bloat
The function is of limited use and is an almost a direct clone of memmove/memcpy (with arguments swapped). Introduction of ERMS variants of string routines would mean avoidable growth of libc.
bcopy will get redefined to a __builtin_memmove later on with this symbol only left for compatibility.
Reviewed by: kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17539
show more ...
|