How do I use compiler intrinsic __fmul_?

Posted by Eric Thoma on Stack Overflow See other posts from Stack Overflow or by Eric Thoma
Published on 2012-06-16T21:11:32Z Indexed on 2012/06/16 21:16 UTC
Read the original article Hit count: 242

Filed under:
|
|
|

I am writing a massively parallel GPU application. I have been optimizing it by hand. I received a 20% performance increase with _fdividef(x, y), and according to The Cuda C Programming Guide (section C.2.1), using similar functions for multiplication and adding is also beneficial.

The function is stated as this: "_fmulrn,rz,ru,rd".

__fdividef(x,y) was not stated with the arguments in brackets. I was wondering, what are those brackets?

If I run the simple code:

int t = __fmul_(5,4);

I a compiler error about how _fmul is undefined. I have the CUDA runtime included, so I don't think it is a setup thing; rather it is something to do with those square brackets. How do I correctly use this function? Thank you.

© Stack Overflow or respective owner

Related posts about c

    Related posts about compiler