Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

M

Michele Dondi

If perl6 had another version of tr/// that returned the resulting
string instead of the number of matches, this could be a 1-liner.

It will certainly have the trans() method to that effect.


Michele
 
M

Mirco Wahab

Michele said:
I have two very long (>64k) strings of equal lengths - $s1 and $s2.
They are strings of bytes, meaning that any value from chr(0) to
chr(255) is legal. $s2, however, will not have any chr(0). $s1 may or
may not have any. What I need to do is look at each byte in $s1 and if
it is chr(0), replace it with the corresponding byte in $s2. So,
something like the following code:

I looked over the entries, there are some
XS and some inline, but the obvious one
didn't show up. So I created *it*.

I started with the msvc/win32 thingy (works
perfectly for Activeperl w/MSVC installed)
but will eventually produce the gcc/as/unix
variant.

The gnu style seems to be (for me) somehow
complicated (the intel syntax isn't). So
thats the first part. You can't have it
much faster:

(simple test case added)==>


use Inline C => qq{
// ==>
void wahab_msvc(SV* s, SV* d)
{
STRLEN srclen, dstlen;
char *src = SvPV(SvRV(s), srclen), *dst = SvPV(SvRV(d), dstlen);
if( srclen < dstlen ) croak("block length mismatch!");
_asm mov edi, dst
_asm mov esi, src
_asm mov ecx, dstlen
_asm xor eax, eax
start:
_asm repne scasb
_asm jne done
_asm mov edx, dstlen
_asm sub edx, ecx
_asm mov ah, byte ptr [-1+esi+edx]
_asm mov byte ptr [-1+edi], ah
_asm jmp start
done: ;
}
// <==
};

my $s1 = 'abcdefghijklmn';
my $s2 = 'abcdefghijklmn';

substr($s2,5,1) = "\x0";

wahab_msvc(\$s1, \$s2);

print "\n$s1\n$s2\n";
<==



Any help (gnu style) much appreciated!

M.
 
M

Mirco Wahab

Mirco said:
... but will eventually produce the gcc/as/unix variant.

OK, I figured out this part too, so (FWIW) I'll
post it. How's that appliable to the PerlMonks
contest? I don't really know (maybe somebody has a hint).

[start here]

....

use Inline C => qq{
// ==> inline
void by_asm(SV* no_zeros, SV* has_zeros)
{
STRLEN srclen, dstlen;
char *src = SvPV(SvRV(no_zeros), srclen);
char *dst = SvPV(SvRV(has_zeros), dstlen);
if( srclen < dstlen ) croak("block length mismatch!");

#ifdef _MSC_VER
_asm mov edi, dst
_asm mov esi, src
_asm mov ecx, dstlen
_asm xor eax, eax
_asm cld
start:
_asm repne scasb
_asm jne done
_asm mov edx, dstlen
_asm sub edx, ecx
_asm mov ah, byte ptr [-1+esi+edx]
_asm mov byte ptr [-1+edi], ah
_asm jmp start
done: ;
#else
__asm__ __volatile__(
"xorl %%eax, %%eax \\n\\t"
"cld \\n\\t"
"start: \\n\\t"
"repne \\n\\t"
"scasb \\n\\t"
"jne done \\n\\t"
"movl %[l], %%edx \\n\\t"
"subl %%ecx, %%edx \\n\\t"
"movb -1(%%esi,%%edx), %%ah \\n\\t"
"movb %%ah, -1(%%edi) \\n\\t"
"jmp start \\n\\t"
"done: \\n\\t"
: /* no output reg */
: "S"(src),"D"(dst),"c"(dstlen),[l]"m"(dstlen)
);
#endif
}
// <== inline
};

my $s_no_zeros = 'abcdefghijklmnopqrstuvwxyz';

my $s_has_zeros = 'abcdefghijklmnopqrstuvwxyz';
substr($s_has_zeros, 5, 1) = "\x0";
substr($s_has_zeros, -1, 1) = "\x0";

by_asm(\$s_no_zeros, \$s_has_zeros);

print "$s_no_zeros\n$s_has_zeros\n";

[end here]


Regards

M.
 
M

Michele Dondi

OK, I figured out this part too, so (FWIW) I'll
post it. How's that appliable to the PerlMonks
contest? I don't really know (maybe somebody has a hint).

If you don't mind (see the discussion at another part of this thread)
I can report your solution there. Even though the "contest" is now
closed, it would be good (IMHO) for completeness. Or else you can do
so yourself, even as an anonymous poster if you like.


Michele
 
M

Mirco Wahab

Michele said:
If you don't mind (see the discussion at another part of this thread)
I can report your solution there. Even though the "contest" is now
closed, it would be good (IMHO) for completeness. Or else you can do
so yourself, even as an anonymous poster if you like.

OK, I did (http://perlmonks.org/?node_id=639826) and
btw. found the vec() solutions to be very very fast
on the Core2 platform. Very interesting.

Thanks & regards

Mirco
 
M

Michele Dondi

OK, I did (http://perlmonks.org/?node_id=639826) and
btw. found the vec() solutions to be very very fast
on the Core2 platform. Very interesting.

++'d! You also posted at the right point IMHO, even if perhaps in
reply to the most complete benchmark at
<http://perlmonks.org/?node_id=638584> would have been more
appropriate. And you may have included the longer portions of code in
readmore tags, just highlighting your contributed code. Anyway, well
done!
Thanks & regards

Thank you for showing me that my attempt was not completely pointless
notwithstanding the negative reactions.


Michele
 
M

Mirco Wahab

Michele said:
<http://perlmonks.org/?node_id=638584> would have been more
appropriate. And you may have included the longer portions of code in
readmore tags, just highlighting your contributed code. Anyway, well

Thanks for your hints, I posted another response
and included links to the code which I deposited
on a server. But I think thats really to late in
this case.

BTW, what I learned (and what I'm glad to know)
is how dead slow the Core2 on CISC IA32 repeated
(prefixed) string operations is (like "rep stringop").

On every Athlon out there, such problems (as said)
can be solved by (IA32) "repne scasb" on the string
in qestion - this will be faster than anything by
a margin.

Not on a Core2, the "rep stringop" will be about
3x slower than a naive load/compare/load/store
sequence.

OK, thats not really Perl related anymore ...

Regards

M.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,708
Latest member
SherleneF1

Latest Threads

Top