Help with Assembly/SSE Multiplication

Posted by Brett on Stack Overflow See other posts from Stack Overflow or by Brett
Published on 2010-06-02T20:52:38Z Indexed on 2010/06/02 20:54 UTC
Read the original article Hit count: 299

Filed under:
|
|

I've been trying to figure out how to gain some improvement in my code at a very crucial couple lines:

float x = a*b;
float y = c*d;
float z = e*f;
float w = g*h;

all a, b, c... are floats.

I decided to look into using SSE, but can't seem to find any improvement, in fact it turns out to be twice as slow. My SSE code is:

Vector4 abcd, efgh, result;
abcd = [float a, float b, float c, float d];
efgh = [float e, float f, float g, float h];
_asm {
movups xmm1, abcd
movups xmm2, efgh
mulps xmm1, xmm2
movups result, xmm1
}

I also attempted using standard inline assembly, but it doesn't appear that I can pack the register with the four floating points like I can with SSE.

Any comments, or help would be greatly appreciated, I mainly need to understand why my calculations using SSE are slower than the serial C++ code?

I'm compiling in Visual Studio 2005, on a Windows XP, using a Pentium 4 with HT if that provides any additional information to assit.

Thanks in advance!

© Stack Overflow or respective owner

Related posts about c++

Related posts about inline-assembly