Hi,
dose anybody here explain to me why memset would be faster than a
simple loop. I doubt about it!
This is a compiler QoI issue, and has nothing to do about the C
language. Also, it depend on which hardware you are running the compiled
program on. Speedup tricks working for one CPU, might change when you
run your program on a different model of the same CPU family.
How fast a memcpy() implementation execute, may also depend of what kind
of load/store you want to do, e.g.
L1 to L1
L1 to L2
L1 to main memory
etc.
first speedup rule, is to make sure data is aligned, before moving big
chunks of data. Then there are possibilities with pre-warming the cache,
loop-unrolling etc.
When the Pentium II came out, I wrote a memcpy() replacement in C, which
used all the tricks I knew, when I checked it vs the asm implementation
by Microsoft, my replacement ran 2x faster.
According to Intel Manuals, really fast memory transfers was possible
via the MMX registers, but I didn't check that out, since it would
require inline assembler.
Of course, when I measured my memcpy() implementation a couple of years
later on a P3, the latest Microsoft version in the standard library ran
2x faster!!!