J
jacob navia
<quote>
... one of the most common programming tasks is to search
through a long string of characters in order to find a
particular byte value. For example strings are often
represented as a sequence of nonzero bytes terminated
by 0. In order to locate the end of a string quickly,
we need a fast way to determine whether all eight bytes of a
given word x are nonzero (because theu usually are).
<end quote>
I discovered that quote above in the fascicle 1a in
"Bitwise Tricks and Techniques"
http://www-cs-faculty.stanford.edu/~knuth/fasc1a.ps.gz
In that document, Knuth explains many boolean tricks, giving
the mathematical background for them too, what many other
books fail to do.
I adapted its algorithm to C. It took me a while because of a
stupid problem, but now it seems to work OK.
The idea is to read 8 bytes at a time and use some
boolean operations that can be done very fast in assembly
to skip over most of the chain, stopping only when a zero
byte appears in one of the eight bytes read.
Knuth's strlen then, looks like this.
#include <stdio.h>
#include <string.h>
#define H 0x8080808080808080ULL
#define L 0x0101010101010101ULL
size_t myStrlen(char *s)
{
unsigned long long t;
char *save = s;
while (1) {
// This supposes that the input string is aligned
// or that the machine doesn't trap when reading
// a 8 byte integer at a random position like the
// x86
t = *(unsigned long long *)s;
if (H & (t - L) & ~t)
break;
s += sizeof(long long);
}
// This loop will be executed at most 7 times
while (*s) {
s++;
}
return s - save;
}
#ifdef TEST
int main(int argc,char *argv[])
{
char *str = "The lazy fox jumped over the slow dog";
if (argc > 1) {
str = argv[1];
}
printf(
"Strlen of '%s' is %d (%d)\n",
str,strlen(str),myStrlen(str));
}
#endif
... one of the most common programming tasks is to search
through a long string of characters in order to find a
particular byte value. For example strings are often
represented as a sequence of nonzero bytes terminated
by 0. In order to locate the end of a string quickly,
we need a fast way to determine whether all eight bytes of a
given word x are nonzero (because theu usually are).
<end quote>
I discovered that quote above in the fascicle 1a in
"Bitwise Tricks and Techniques"
http://www-cs-faculty.stanford.edu/~knuth/fasc1a.ps.gz
In that document, Knuth explains many boolean tricks, giving
the mathematical background for them too, what many other
books fail to do.
I adapted its algorithm to C. It took me a while because of a
stupid problem, but now it seems to work OK.
The idea is to read 8 bytes at a time and use some
boolean operations that can be done very fast in assembly
to skip over most of the chain, stopping only when a zero
byte appears in one of the eight bytes read.
Knuth's strlen then, looks like this.
#include <stdio.h>
#include <string.h>
#define H 0x8080808080808080ULL
#define L 0x0101010101010101ULL
size_t myStrlen(char *s)
{
unsigned long long t;
char *save = s;
while (1) {
// This supposes that the input string is aligned
// or that the machine doesn't trap when reading
// a 8 byte integer at a random position like the
// x86
t = *(unsigned long long *)s;
if (H & (t - L) & ~t)
break;
s += sizeof(long long);
}
// This loop will be executed at most 7 times
while (*s) {
s++;
}
return s - save;
}
#ifdef TEST
int main(int argc,char *argv[])
{
char *str = "The lazy fox jumped over the slow dog";
if (argc > 1) {
str = argv[1];
}
printf(
"Strlen of '%s' is %d (%d)\n",
str,strlen(str),myStrlen(str));
}
#endif