Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > How to inline assembly in a C program?

Reply
Thread Tools

How to inline assembly in a C program?

 
 
swept.along.by.events@gmail.com
Guest
Posts: n/a
 
      03-03-2013
Hi everyone,
I've been reading about this for a few days but didn't find anything relevant or clear enough.

I'm trying to learn how to write inline x86 assembly for gcc in linux. My problem is not writing assembly, but how to make the assembly work in C. I'm starting with this tiny function that multiplies two 64bit integers, putting the high 64b in *rh and the low in *rl:

void Mul64c( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__uint128_t r = (__uint128_t)a * (__uint128_t)b;
*rh = (uint64_t)(r >> 64);
*rl = (uint64_t)(r);
}

After reading various manuals, I wrote this:

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,(%0);"
"mov %%rax,(%1);"
: "=D" (rh),
"=S" (rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

From what I read, integers and pointers are passed in registers %rdi, %rsi,%rdx, %rcx, so I put "=D", "=S", "d", "c" in the output/input constraints. But when I build the file with

gcc -O2 -c mul64asm.c

and analyze the result with objdump, I see this:

0000000000000000 <Mul64asm>:
0: f3 c3 repz retq

So basically it's thinking that my code is a NOP? Why is that?

Thanks.

 
Reply With Quote
 
 
 
 
Johannes Bauer
Guest
Posts: n/a
 
      03-03-2013
On 03.03.2013 21:45, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
> {
> __asm__( "mov %2, %%rax;"
> "mul %3;"
> "mov %%rdx,(%0);"
> "mov %%rax,(%1);"
> : "=D" (rh),
> "=S" (rl)
> : "d" (a),
> "c" (b)
> : "%rax"
> );
> }
>
> and analyze the result with objdump, I see this:
>
> 0000000000000000 <Mul64asm>:
> 0: f3 c3 repz retq
>
> So basically it's thinking that my code is a NOP? Why is that?


You've passed two pointers to the assembly part, but didn't tell the
assembler that you've actually dereferenced them, so your code is
optimized out. You may want to clobber memory (you only clobber rax at
the moment) or use __asm__ __volatile__.

Best regards,
Johannes

--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

> Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$(E-Mail Removed)>
 
Reply With Quote
 
 
 
 
Philip Lantz
Guest
Posts: n/a
 
      03-05-2013
Johannes Bauer wrote:
> swept.along.by.events wrote:
>
> > void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
> > {
> > __asm__( "mov %2, %%rax;"
> > "mul %3;"
> > "mov %%rdx,(%0);"
> > "mov %%rax,(%1);"
> > : "=D" (rh),
> > "=S" (rl)
> > : "d" (a),
> > "c" (b)
> > : "%rax"
> > );
> > }
> >
> > and analyze the result with objdump, I see this:
> >
> > 0000000000000000 <Mul64asm>:
> > 0: f3 c3 repz retq
> >
> > So basically it's thinking that my code is a NOP? Why is that?

>
> You've passed two pointers to the assembly part, but didn't tell the
> assembler that you've actually dereferenced them, so your code is
> optimized out. You may want to clobber memory (you only clobber rax at
> the moment) or use __asm__ __volatile__.


I recommend letting gcc know that you are using a memory operand:

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__( "mov %2, %%rax;"
"mul %3;"
"mov %%rdx,%0;"
"mov %%rax,%1;"
: "=m" (*rh),
"=m" (*rl)
: "d" (a),
"c" (b)
: "%rax"
);
}

It's also preferable to let the compiler choose the operand locations,
instead of specifying them, except where a specific register is
required, and let gcc generate the loads and stores.

void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
{
__asm__("mul %3"
: "=d" (*rh),
"=a" (*rl)
: "a" (a),
"rm" (b)
);
}

Your original code (and also my first rewrite above) neglects to tell
the compiler that it clobbers rdx. The second version above fixes that.
The compiler assumes that the value it put in rdx (the parameter a) will
still be there. Since a isn't used again, it seems like it wouldn't
matter, but if this function is inlined, the compiler will know what is
in that register and may use it again. I just found a bug a couple days
ago with that exact problem. (Note, you can't just add rdx to the
clobber list in your version, since you specify it as an input operand.)
 
Reply With Quote
 
swept.along.by.events@gmail.com
Guest
Posts: n/a
 
      03-05-2013
On Tuesday, March 5, 2013 9:43:30 AM UTC+1, David Brown wrote:
> On 05/03/13 07:36, Philip Lantz wrote:
>
> > Johannes Bauer wrote:

>
> >> swept.along.by.events wrote:

>
> >>

>
> >>> void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )

>
> >>> {

>
> >>> __asm__( "mov %2, %%rax;"

>
> >>> "mul %3;"

>
> >>> "mov %%rdx,(%0);"

>
> >>> "mov %%rax,(%1);"

>
> >>> : "=D" (rh),

>
> >>> "=S" (rl)

>
> >>> : "d" (a),

>
> >>> "c" (b)

>
> >>> : "%rax"

>
> >>> );

>
> >>> }

>
> >>>

>
> >>> and analyze the result with objdump, I see this:

>
> >>>

>
> >>> 0000000000000000 <Mul64asm>:

>
> >>> 0: f3 c3 repz retq

>
> >>>

>
> >>> So basically it's thinking that my code is a NOP? Why is that?

>
> >>

>
> >> You've passed two pointers to the assembly part, but didn't tell the

>
> >> assembler that you've actually dereferenced them, so your code is

>
> >> optimized out. You may want to clobber memory (you only clobber rax at

>
> >> the moment) or use __asm__ __volatile__.

>
> >

>
> > I recommend letting gcc know that you are using a memory operand:

>
> >

>
> > void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )

>
> > {

>
> > __asm__( "mov %2, %%rax;"

>
> > "mul %3;"

>
> > "mov %%rdx,%0;"

>
> > "mov %%rax,%1;"

>
> > : "=m" (*rh),

>
> > "=m" (*rl)

>
> > : "d" (a),

>
> > "c" (b)

>
> > : "%rax"

>
> > );

>
> > }

>
> >

>
> > It's also preferable to let the compiler choose the operand locations,

>
> > instead of specifying them, except where a specific register is

>
> > required, and let gcc generate the loads and stores.

>
> >

>
> > void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )

>
> > {

>
> > __asm__("mul %3"

>
> > : "=d" (*rh),

>
> > "=a" (*rl)

>
> > : "a" (a),

>
> > "rm" (b)

>
> > );

>
> > }

>
> >

>
> > Your original code (and also my first rewrite above) neglects to tell

>
> > the compiler that it clobbers rdx. The second version above fixes that.

>
> > The compiler assumes that the value it put in rdx (the parameter a) will

>
> > still be there. Since a isn't used again, it seems like it wouldn't

>
> > matter, but if this function is inlined, the compiler will know what is

>
> > in that register and may use it again. I just found a bug a couple days

>
> > ago with that exact problem. (Note, you can't just add rdx to the

>
> > clobber list in your version, since you specify it as an input operand.)

>
> >

>
>
>
> I too recommend this sort of style. (I am not very familiar with inline
>
> assembly on x86, but have used it with other targets.) Let gcc handle
>
> the moves - that lets it optimise the code better. This is particularly
>
> important if "Mul64asm" is made "static inline" so that it is mixed in
>
> directly with other code. gcc will then be able to take advantage of
>
> things like having "a" or "b" already in a register, or using the
>
> results "*rl" or "*rh" without actually storing them out to memory. It
>
> will also be able to overlap the "mov" instructions for one Mul64asm
>
> with other code (assuming your cpu has enough registers) for better
>
> pipelining, and it will mix and match the choice of registers used
>
> (again, if your cpu has that choice). And of course, avoiding general
>
> memory clobbers and "asm volatile" is a big help to optimisation.
>
>
>
> Generally speaking, you let gcc do as much as possible, and keep the
>
> assembly code to a minimum. It's not as important in a register-poor,
>
> non-orthogonal architecture like the x86 where so much of the work goes
>
> through the bottleneck of a single "rax" register, but it can make a
>
> very big difference on more modern processor architectures with large
>
> numbers of general-purpose registers, or half-way architectures like
>
> x86-64 with its 16 registers.



Thanks a lot to both, Philip's second version works like a charm both as a separate function and inlined. Could you tell me if I'm reading it correctly?

: "=d" (*rh), // it's saying that *rh comes from the %rdx register
"=a" (*rl) // same, must take *rl from %rax
: "a" (a), // I want parameter 'a' in %rax before the mul
"rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with

Thanks!

 
Reply With Quote
 
Philip Lantz
Guest
Posts: n/a
 
      03-06-2013
swept.along.by.events wrote:
>> Philip Lantz wrote:
>>> void Mul64asm( uint64_t* rh, uint64_t* rl, uint64_t a, uint64_t b )
>>> {
>>> __asm__("mul %3"
>>> : "=d" (*rh),
>>> "=a" (*rl)
>>> : "a" (a),
>>> "rm" (b)
>>> );
>>> }

>
> Thanks a lot to both, Philip's second version works like a charm both
> as a separate function and inlined. Could you tell me if I'm reading
> it correctly?
>
> : "=d" (*rh), // it's saying that *rh comes from the %rdx register
> "=a" (*rl) // same, must take *rl from %rax
> : "a" (a), // I want parameter 'a' in %rax before the mul
> "rm" (b) // can store 'b' anywhere you want (register or memory), and wherever it is, that's what you multiply %rax with


Yes, I think you are understanding it correctly.

Another way of saying it: "=d" (*rh) means that the assembly code
generates a result in rdx, which should be stored into *rh; "rm" (b)
means that the assembly code uses b as an operand, and the operand can
be in either register or memory.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Tool which expands implicitly inline inline functions tthunder@gmx.de C++ 3 06-16-2005 12:54 AM
To inline or not to inline? Alvin C++ 7 05-06-2005 03:04 PM
External inline functions calling internal inline functions Daniel Vallstrom C Programming 2 11-21-2003 01:57 PM
inline or not to inline in C++ Abhi C++ 2 07-03-2003 12:07 AM



Advertisments