| df40dcbf | 13-May-2021 |
Cyril Zhang <[email protected]> |
sort: Cache value of MB_CUR_MAX
Every usage of MB_CUR_MAX results in a call to __mb_cur_max. This is inefficient and redundant. Caching the value of MB_CUR_MAX in a global variable removes these c
sort: Cache value of MB_CUR_MAX
Every usage of MB_CUR_MAX results in a call to __mb_cur_max. This is inefficient and redundant. Caching the value of MB_CUR_MAX in a global variable removes these calls and speeds up the runtime of sort. For numeric sorting, runtime is almost halved in some tests.
PR: 255551 PR: 255840 Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30170
(cherry picked from commit 71ec05a21257e159f40d54e26ad0011bb19b5134)
show more ...
|
| 7a590a37 | 11-Apr-2019 |
Conrad Meyer <[email protected]> |
sort(1): Simplify and bound random seeding
Bound input file processing length to avoid the issue reported in [1]. For simplicity, only allow regular file and character device inputs. For character
sort(1): Simplify and bound random seeding
Bound input file processing length to avoid the issue reported in [1]. For simplicity, only allow regular file and character device inputs. For character devices, only allow /dev/random (and /dev/urandom symblink).
32 bytes of random is perfectly sufficient to seed MD5; we don't need any more. Users that want to use large files as seeds are encouraged to truncate those files down to an appropriate input file via tools like sha256(1).
(This does not change the sort algorithm of sort -R.)
[1]: https://lists.freebsd.org/pipermail/freebsd-hackers/2018-August/053152.html
PR: 230792 Reported by: Ali Abdallah <aliovx AT gmail.com> Relnotes: yes
show more ...
|
| fff4eaeb | 04-Apr-2019 |
Conrad Meyer <[email protected]> |
sort(1): randomcoll: Skip the memory allocation entirely
There's no reason to order based on strcmp of ASCII digests instead of memcmp of the raw digests.
While here, remove collision fallback. If
sort(1): randomcoll: Skip the memory allocation entirely
There's no reason to order based on strcmp of ASCII digests instead of memcmp of the raw digests.
While here, remove collision fallback. If you collide two MD5s, they're probably the same string anyway. If robustness against MD5 collisions is desired, maybe we shouldn't use MD5.
None of the behavior of sort -R is specified by POSIX, so we're free to implement this however we like. E.g., using a 128-bit counter and block cipher to generate unique indices for each line of input.
PR: 230792 (2/many) Relnotes: This will change the sort order for a given dataset with a given seed. Other similarly breaking changes are planned. Sponsored by: Dell EMC Isilon
show more ...
|
| 7137597e | 20-Jun-2018 |
Kyle Evans <[email protected]> |
sort(1): Fix -m when only implicit stdin is used for input
Observe:
printf "a\nb\nc\n" > /tmp/foo # Next command results in no output cat /tmp/foo | sort -m # Next command results in proper output
sort(1): Fix -m when only implicit stdin is used for input
Observe:
printf "a\nb\nc\n" > /tmp/foo # Next command results in no output cat /tmp/foo | sort -m # Next command results in proper output cat /tmp/foo | sort -m - # Also works: sort -m /tmp/foo
Some const'ification was done to simplify the actual solution of adding "-" explicitly to the file list if we didn't have any file arguments left over.
PR: 190099 MFC after: 1 week
show more ...
|