Le 04/01/12 18:59, Stephen Sprunk a écrit :
On 26-Dec-11 14:36, jacob navia wrote:
Le 26/12/11 21:24, Stephen Sprunk a écrit :
On 25-Dec-11 14:35, jacob navia wrote:
Assembly will always do better than standard C in the above specific
code. You can then use the ADDPD and add 2 integers at the same time
(under specific loop conditions).
... unless you have a vectorizing compiler, which will do the exact same
thing for you without the need for inline assembly.
Sure. That was a simple example, and no, gcc doesn't do it, at least
in the version I have
Like many GCC sub-projects, the work isn't complete, and probably isn't
on by default:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
Note that those instructions are available since at least 10 YEARS
and still mainstream compilers like MSVC or GCC do not perform any of
the magic you invoke...
Sure, maybe ONE DAY IN THE FAR FUTURE they will do it, but until thenI
am using those instructions in my programs now...
GCC isn't the only compiler out there, either; I've heard ICC is very
good at this, for instance.
Yes, you need ALLL THE RESOURCES of Intel Corp to build such a compiler,
no wonder nobody has done that besides Intel.
Also, when AVX support comes out, that vectorizing compiler will start
using AVX instructions while your assembly is stuck using SSE.
Sure, you can recompile your program bu I can't change a few loop
`conditions since I am "stuck" with my old program, poor me.
AVX will be different instructions, different registers, etc.
What?
Look Stephen, you are surely knowledgable in C but in assembly...
Take for instance ANDPS (AND NOT PACKED Single precision)
The corresponding AVX instructions is...
VANDNPS, with the same syntax same semantics. You just
add a "V" before the instruction mnemonic.
All you have to do is use the ymm registers with 256 bits
instead of the xmm registers with 128 bits. Your loop
counters must be adjusted (using only half as many loops)
and that was it, maybe an hour of work for a big program.
It won't
be a complete rewrite from scratch, but it'll be a lot of work, whereas
someone with a vectorizing compiler can just change one option and
recompile.
Yes, but it will be at least 10 years before gcc or MSVC propose that..
gcc doesn't even propose automatically SSE2 or SSE3 TODAY, 10 years
after it was announced by Intel...
Furthermore, there is no guaranteed win for using SEE; early SSE chips
simply decoded that into two FADDs, so there was no performance benefit
to all that work you invested in writing SSE assembly.
That was in 2000, when it first appeared. Pleeeeeze, we are 2012 now,
that is no longer relevant at all.
But let's forget about that. Look at this.
problem:
You have a vector of 8 long long integers and you want to shift that
vector by a given amount left between 1 and 63 bits.
Interface:
void ShiftVector(long long *vector, int AmountToShift);
; Pointer to vector in %rdi
; Amount to shift in %rsi
; gcc calling conventions for x86-64
shiftvector:
movq %rsi,%rcx