| ff7799e0 | 06-Dec-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add memrchr() scalar, baseline implementation
The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be i
lib/libc/amd64/string: add memrchr() scalar, baseline implementation
The scalar implementation is fairly simplistic and only performs slightly better than the generic C implementation. It could be improved by using the same algorithm as for memchr, but it would have been a lot more complicated.
The baseline implementation is similar to timingsafe_memcmp. It's slightly slower than memchr() due to the more complicated main loop, but I don't think that can be significantly improved.
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42925
(cherry picked from commit fb197a4f7751bb4e116989e57ba7fb12a981895f)
show more ...
|
| ddab9e64 | 04-Dec-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()
This picks up the accelerated implementation of memccpy().
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 mo
lib/libc/amd64/string: implement strncat() by calling strlen(), memccpy()
This picks up the accelerated implementation of memccpy().
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902
(cherry picked from commit ea7b13771cc9d45bf1bc6c6edad8d1b7bce12990)
show more ...
|
| a3ce82e5 | 02-Dec-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add memccpy scalar, baseline implementation
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced implementation of memccpy for amd64. A scalar implementation
lib/libc/amd64/string: add memccpy scalar, baseline implementation
Based on the strlcpy code from D42863, this patch adds a SIMD-enhanced implementation of memccpy for amd64. A scalar implementation calling into memchr and memcpy to do the job is provided, too.
Please note that this code does not behave exactly the same as the C implementation of memccpy for overlapping inputs. However, overlapping inputs are not allowed for this function by ISO/IEC 9899:1999 and neither has the C implementation any code to deal with the possibility. It just proceeds byte-by-byte, which may or may not do the expected thing for some overlaps. We do not document whether overlapping inputs are supported in memccpy(3).
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42902
(cherry picked from commit fc0e38a7a67a6d43095efb00cf19ee5f95dcf710)
show more ...
|
| 3045c0f1 | 29-Nov-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: implement strlcat() through strlcpy()
This should pick up our optimised memchr(), strlen(), and strlcpy() when strlcat() is called.
Tested by: developers@, exp-run Approved b
lib/libc/amd64/string: implement strlcat() through strlcpy()
This should pick up our optimised memchr(), strlen(), and strlcpy() when strlcat() is called.
Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863
(cherry picked from commit 2b7b03b7ae179db465c1ef19a5007f729874916a)
show more ...
|
| 903cb811 | 12-Nov-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add strlcpy scalar, baseline implementation
Somewhat similar to stpncpy, but different in that we need to compute the full source length even if the buffer is shorter than the
lib/libc/amd64/string: add strlcpy scalar, baseline implementation
Somewhat similar to stpncpy, but different in that we need to compute the full source length even if the buffer is shorter than the source.
strlcat is implemented as a simple wrapper around strlcpy. The scalar implementation of strlcpy just calls into strlen() and memcpy() to do the job.
Perf-wise we're very close to stpncpy. The code is slightly slower as it needs to carry on with finding the source string length even if the buffer ends before the string.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42863
(cherry picked from commit 74d6cfad54d676299ee5e4695139461876dfd757)
show more ...
|
| 7a605ba8 | 14-Nov-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string/strcat.S: enable use of SIMD
strcat has a bespoke scalar assembly implementation we inherited from NetBSD. While it performs well, it is better to call into our SIMD implement
lib/libc/amd64/string/strcat.S: enable use of SIMD
strcat has a bespoke scalar assembly implementation we inherited from NetBSD. While it performs well, it is better to call into our SIMD implementations if any SIMD features are available at all. So do that and implement strcat() by calling into strlen() and strcpy() if these are available.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Reviison: https://reviews.freebsd.org/D42600
(cherry picked from commit aff9143a242c0012b0195b3666e03fa3b7cd33e8)
show more ...
|
| 76f9afcd | 09-Nov-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: implement strncpy() by calling stpncpy()
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 27578
lib/libc/amd64/string: implement strncpy() by calling stpncpy()
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
(cherry picked from commit e19d46c808267f53455e96a28ff7654211523d2c)
show more ...
|
| 7527fecb | 30-Oct-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it ju
lib/libc/amd64/string: add stpncpy scalar, baseline implementation
This was surprisingly annoying to get right, despite being such a simple function. A scalar implementation is also provided, it just calls into our optimised memchr(), memcpy(), and memset() routines to carry out its job.
I'm quite happy with the performance. glibc only beats us for very long strings, likely due to the use of AVX-512. The scalar implementation just calls into our optimised memchr(), memcpy(), and memset() routines, so it has a high overhead to begin with but then performs ok for the amount of effort that went into it. Still beats the old C code, except for very short strings.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42519
(cherry picked from commit 90253d49db09a9b1490c448d05314f3e4bbfa468)
show more ...
|
| 265fb89a | 24-Oct-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps. On amd64, we now have an optimised implementation of strcspn(), so instead
lib/libc/amd64/string: implement strsep() through strcspn()
The strsep() function is basically strcspn() with extra steps. On amd64, we now have an optimised implementation of strcspn(), so instead of implementing the inner loop manually, just call into the optimised routine.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42346
(cherry picked from commit fd2ecd91aeeeab579c769c9a39f90b4bd4a493a9)
show more ...
|
| 9b1a851e | 12-Oct-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add strrchr scalar, baseline implementation
The baseline implementation is very straightforward, while the scalar implementation suffers from register pressure and the need to
lib/libc/amd64/string: add strrchr scalar, baseline implementation
The baseline implementation is very straightforward, while the scalar implementation suffers from register pressure and the need to use SWAR techniques similar to those used for strchr().
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42217
(cherry picked from commit 2ed514a220edbac6ca5ec9f40a3e0b3f2804796d)
show more ...
|
| 3a19fcb9 | 27-Sep-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add strncmp scalar, baseline implementation
The scalar implementation is fairly straightforward and merely unrolled four times. The baseline implementation closely follows D4
lib/libc/amd64/string: add strncmp scalar, baseline implementation
The scalar implementation is fairly straightforward and merely unrolled four times. The baseline implementation closely follows D41971 with appropriate extensions and extra code paths to pay attention to string length.
Performance is quite good. We beat both glibc (except for very long strings, but they likely use AVX which we don't) and Bionic (except for medium-sized aligned strings, where we are still in the same ballpark).
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D42122
(cherry picked from commit 14289e973f5c941e4502cc2b11265e4b3072839a)
show more ...
|
| 309b30ce | 25-Sep-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: implement strpbrk() through strcspn()
This lets us use our optimised strcspn() routine for strpbrk() calls.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-r
lib/libc/amd64/string: implement strpbrk() through strcspn()
This lets us use our optimised strcspn() routine for strpbrk() calls.
Sponsored by: The FreeBSD Foundation Tested by: developers@, exp-run Approved by: mjg MFC after: 1 month MFC to: stable/14 PR: 275785 Differential Revision: https://reviews.freebsd.org/D41980
(cherry picked from commit f4fc317c364f2c81ad3d36763d8e5a60393ddbd1)
show more ...
|
| 9a6a587e | 15-Oct-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (
lib/libc/amd64/string: add timingsafe_memcmp() assembly implementation
Conceptually very similar to timingsafe_bcmp(), but with comparison logic inspired by Elijah Stone's fancy memcmp. A baseline (SSE) implementation was omitted this time as I was not able to get it to perform adequately. Best I got was 8% over the scalar version for long inputs, but slower for short inputs.
Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Inspired by: https://github.com/moon-chilled/fancy-memcmp Differential Revision: https://reviews.freebsd.org/D41696
(cherry picked from commit 5048c1b85506c5e0f441ee7dd98dd8d96d0a4a24)
show more ...
|
| 1347ec5d | 30-Aug-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
Very straightforward and similar to memcmp(3). The code has been written to use only instructions specified as having d
lib/libc/amd64/string: add timingsafe_bcmp(3) scalar, baseline implementations
Very straightforward and similar to memcmp(3). The code has been written to use only instructions specified as having data operand independent timing by Intel.
Sponsored by: The FreeBSD Foundation Approved by: security (cperciva) Differential Revision: https://reviews.freebsd.org/D41673
(cherry picked from commit 76c2b331bcd9f73c5c8c43a06e328fa0c7b8c39a)
show more ...
|
| 0666c6fc | 14-Sep-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string/memcmp.S: harden against phony buffer lengths
When memcmp(a, b, len) (or equally, bcmp) is called with a phony length such that a + len < a, the code would malfunction and not
lib/libc/amd64/string/memcmp.S: harden against phony buffer lengths
When memcmp(a, b, len) (or equally, bcmp) is called with a phony length such that a + len < a, the code would malfunction and not compare the two buffers correctly. While such arguments are illegal (buffers do not wrap around the end of the address space), it is neverthless conceivable that people try things like memcmp(a, b, SIZE_MAX) to compare a and b until the first mismatch, in the knowledge that such a mismatch exists, expecting memcmp() to stop comparing somewhere around the mismatch. While memcmp() is usually written to confirm to this assumption, no version of ISO/IEC 9899 guarantees this behaviour (in contrast to memchr() for which it is).
Neverthless it appears sensible to at least not grossly misbehave on phony lengths. This change hardens memcmp() against this case by comparing at least until the end of the address space if a + len overflows a 64 bit integer.
Sponsored by: The FreeBSD Foundation Approved by: mjg (blanket, via IRC) See also: b2618b651b28fd29e62a4e285f5be09ea30a85d4 MFC after: 1 week
(cherry picked from commit 953b93cf24d8871c62416c9bcfca935f1f1853b6)
show more ...
|
| 62f73a71 | 08-Sep-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement strnlen(3) with better perofrmance.
Sponsored by: The FreeBSD Founda
lib/libc/amd64/string: implement strnlen(3) trough memchr(3)
Now that we have an optimised memchr(3), we can use it to implement strnlen(3) with better perofrmance.
Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41598
(cherry picked from commit 331737281c1929c29e679e48783055351ac4fbd9)
show more ...
|
| 3f78bde9 | 24-Aug-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add memchr(3) scalar, baseline implementation
This is conceptually similar to strchr(3), but there are slight changes to account for the buffer having an explicit buffer lengt
lib/libc/amd64/string: add memchr(3) scalar, baseline implementation
This is conceptually similar to strchr(3), but there are slight changes to account for the buffer having an explicit buffer length.
this includes the bug fix from b2618b6.
Sponsored by: The FreeBSD Foundation Reported by: yuri, des Tested by: des Approved by: mjg MFC after: 1 week MFC to: stable/14 PR: 273652 Differential Revision: https://reviews.freebsd.org/D41598
(cherry picked from commit de12a689fad271f5a2ba7c188b0b5fb5cabf48e7) (cherry picked from commit b2618b651b28fd29e62a4e285f5be09ea30a85d4)
show more ...
|
| 39d50019 | 21-Aug-2023 |
Robert Clausecker <[email protected]> |
lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation
This is conceptually very similar to the strcspn(3) implementations from D41557, but we can't do the fast paths the same way.
S
lib/libc/amd64/string: add strspn(3) scalar, x86-64-v2 implementation
This is conceptually very similar to the strcspn(3) implementations from D41557, but we can't do the fast paths the same way.
Sponsored by: The FreeBSD Foundation Approved by: mjg MFC after: 1 week MFC to: stable/14 Differential Revision: https://reviews.freebsd.org/D41567
(cherry picked from commit 7084133cde6a58412d86bae9f8a55b86141fb304)
show more ...
|