N
Nudge
Yes, 8 is not too bad, but a complete unroll requires 64
Hi Scott,
The entire routine weighs approximately 8 KB (75% from the unrolled
inner loop).
What do you mean "once finish the SHA-256"? When the OS switches
the context to a different process? I thought the cache was flushed
anyway on a context switch...
P.S. The Athlon, unlike the P4, has large L1 caches:
64 KB L1 I$
64 KB L1 D$
256 KB L2 I+D$ (512 KB for Barton)
Yes, it'll fit, but it'll push most everything else out. That
means that once you finish the SHA-256, you'll start generating
lots of cache misses :-(
Hi Scott,
The entire routine weighs approximately 8 KB (75% from the unrolled
inner loop).
What do you mean "once finish the SHA-256"? When the OS switches
the context to a different process? I thought the cache was flushed
anyway on a context switch...
P.S. The Athlon, unlike the P4, has large L1 caches:
64 KB L1 I$
64 KB L1 D$
256 KB L2 I+D$ (512 KB for Barton)