How to inline assembly in a C program?

  • Thread starter swept.along.by.events
  • Start date
S

swept.along.by.events

Hi everyone,
I've been reading about this for a few days but didn't find anything relevant or clear enough.

I'm trying to learn how to write inline x86 assembly for gcc in linux. My problem is not writing assembly, but how to make the assembly work in C. I'm starting with this tiny function that multiplies two 64bit integers, putting the high 64b in *rh and the low in *rl:

void Mul64c( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__uint128_t r = (__uint128_t)a * (__uint128_t)b;
*rh = (uint64_t)(r >> 64);
*rl = (uint64_t)(r);
}

After reading various manuals, I wrote this:

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,(%0);"
"mov %%rax,(%1);"
: "=D" (rh),
"=S" (rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

From what I read, integers and pointers are passed in registers %rdi, %rsi,%rdx, %rcx, so I put "=D", "=S", "d", "c" in the output/input constraints. But when I build the file with

gcc -O2 -c mul64asm.c

and analyze the result with objdump, I see this:

0000000000000000 <Mul64asm>:
0: f3 c3 repz retq

So basically it's thinking that my code is a NOP? Why is that?

Thanks.
 
J

Johannes Bauer

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,(%0);"
"mov %%rax,(%1);"
: "=D" (rh),
"=S" (rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

and analyze the result with objdump, I see this:

0000000000000000 <Mul64asm>:
0: f3 c3 repz retq

So basically it's thinking that my code is a NOP? Why is that?

You've passed two pointers to the assembly part, but didn't tell the
assembler that you've actually dereferenced them, so your code is
optimized out. You may want to clobber memory (you only clobber rax at
the moment) or use __asm__ __volatile__.

Best regards,
Johannes

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
P

Philip Lantz

Johannes said:
You've passed two pointers to the assembly part, but didn't tell the
assembler that you've actually dereferenced them, so your code is
optimized out. You may want to clobber memory (you only clobber rax at
the moment) or use __asm__ __volatile__.

I recommend letting gcc know that you are using a memory operand:

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,%0;"
"mov %%rax,%1;"
: "=m" (*rh),
"=m" (*rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

It's also preferable to let the compiler choose the operand locations,
instead of specifying them, except where a specific register is
required, and let gcc generate the loads and stores.

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__("mul %3"
: "=d" (*rh),
"=a" (*rl)
: "a" (a),
"rm" (b)
);
}

Your original code (and also my first rewrite above) neglects to tell
the compiler that it clobbers rdx. The second version above fixes that.
The compiler assumes that the value it put in rdx (the parameter a) will
still be there. Since a isn't used again, it seems like it wouldn't
matter, but if this function is inlined, the compiler will know what is
in that register and may use it again. I just found a bug a couple days
ago with that exact problem. (Note, you can't just add rdx to the
clobber list in your version, since you specify it as an input operand.)
 
S

swept.along.by.events

I too recommend this sort of style. (I am not very familiar with inline

assembly on x86, but have used it with other targets.) Let gcc handle

the moves - that lets it optimise the code better. This is particularly

important if "Mul64asm" is made "static inline" so that it is mixed in

directly with other code. gcc will then be able to take advantage of

things like having "a" or "b" already in a register, or using the

results "*rl" or "*rh" without actually storing them out to memory. It

will also be able to overlap the "mov" instructions for one Mul64asm

with other code (assuming your cpu has enough registers) for better

pipelining, and it will mix and match the choice of registers used

(again, if your cpu has that choice). And of course, avoiding general

memory clobbers and "asm volatile" is a big help to optimisation.



Generally speaking, you let gcc do as much as possible, and keep the

assembly code to a minimum. It's not as important in a register-poor,

non-orthogonal architecture like the x86 where so much of the work goes

through the bottleneck of a single "rax" register, but it can make a

very big difference on more modern processor architectures with large

numbers of general-purpose registers, or half-way architectures like

x86-64 with its 16 registers.


Thanks a lot to both, Philip's second version works like a charm both as a separate function and inlined. Could you tell me if I'm reading it correctly?

: "=d" (*rh), // it's saying that *rh comes from the %rdx register
"=a" (*rl) // same, must take *rl from %rax
: "a" (a), // I want parameter 'a' in %rax before the mul
"rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with

Thanks!
 
P

Philip Lantz

swept.along.by.events said:
Thanks a lot to both, Philip's second version works like a charm both
as a separate function and inlined. Could you tell me if I'm reading
it correctly?

: "=d" (*rh), // it's saying that *rh comes from the %rdx register
"=a" (*rl) // same, must take *rl from %rax
: "a" (a), // I want parameter 'a' in %rax before the mul
"rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with

Yes, I think you are understanding it correctly.

Another way of saying it: "=d" (*rh) means that the assembly code
generates a result in rdx, which should be stored into *rh; "rm" (b)
means that the assembly code uses b as an operand, and the operand can
be in either register or memory.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top