VFP Unit Matrix Multiply problem on the iPhone

Posted by Ian Copland on Stack Overflow See other posts from Stack Overflow or by Ian Copland
Published on 2010-04-26T19:07:56Z Indexed on 2010/04/26 19:13 UTC
Read the original article Hit count: 621

Filed under:

Hi. I'm trying to write a Matrix3x3 multiply using the Vector Floating Point on the iPhone, however i'm encountering some problems. This is my first attempt at writing any ARM assembly, so it could be a faily simple solution that i'm not seeing.

I've currently got a small application running using a maths library that i've written. I'm investigating into the benifits using the Vector Floating Point Unit would provide so i've taken my matrix multiply and converted it to asm. Previously the application would run without a problem, however now my objects will all randomly disappear. This seems to be caused by the results from my matrix multiply becoming NAN at some point.

Heres the code

IMatrix3x3 operator*(IMatrix3x3 & _A, IMatrix3x3 & _B)
{
    IMatrix3x3 C;

    //C++ code for the simulator
#if TARGET_IPHONE_SIMULATOR == true
    C.A0 = _A.A0 * _B.A0 + _A.A1 * _B.B0 + _A.A2 * _B.C0;
    C.A1 = _A.A0 * _B.A1 + _A.A1 * _B.B1 + _A.A2 * _B.C1;
    C.A2 = _A.A0 * _B.A2 + _A.A1 * _B.B2 + _A.A2 * _B.C2;

    C.B0 = _A.B0 * _B.A0 + _A.B1 * _B.B0 + _A.B2 * _B.C0;
    C.B1 = _A.B0 * _B.A1 + _A.B1 * _B.B1 + _A.B2 * _B.C1;
    C.B2 = _A.B0 * _B.A2 + _A.B1 * _B.B2 + _A.B2 * _B.C2;

    C.C0 = _A.C0 * _B.A0 + _A.C1 * _B.B0 + _A.C2 * _B.C0;
    C.C1 = _A.C0 * _B.A1 + _A.C1 * _B.B1 + _A.C2 * _B.C1;
    C.C2 = _A.C0 * _B.A2 + _A.C1 * _B.B2 + _A.C2 * _B.C2;

//VPU ARM asm for the device
#else   
    //create a pointer to the Matrices
    IMatrix3x3 * pA = &_A;
    IMatrix3x3 * pB = &_B;
    IMatrix3x3 * pC = &C;

//asm code
asm volatile(
             //turn on a vector depth of 3
             "fmrx r0, fpscr \n\t"
             "bic r0, r0, #0x00370000 \n\t"
             "orr r0, r0, #0x00020000 \n\t"
             "fmxr fpscr, r0 \n\t"

             //load matrix B into the vector bank
             "fldmias %1, {s8-s16} \n\t"

             //load the first row of A into the scalar bank
             "fldmias %0!, {s0-s2} \n\t"

             //calulate C.A0, C.A1 and C.A2
             "fmuls s17, s8, s0 \n\t"
             "fmacs s17, s11, s1 \n\t"
             "fmacs s17, s14, s2 \n\t"

             //save this into the output
             "fstmias %2!, {s17-s19} \n\t"

             //load the second row of A into the scalar bank
             "fldmias %0!, {s0-s2} \n\t"

             //calulate C.B0, C.B1 and C.B2
             "fmuls s17, s8, s0 \n\t"
             "fmacs s17, s11, s1 \n\t"
             "fmacs s17, s14, s2 \n\t"

             //save this into the output
             "fstmias %2!, {s17-s19} \n\t"

             //load the third row of A into the scalar bank
             "fldmias %0!, {s0-s2} \n\t"

             //calulate C.C0, C.C1 and C.C2
             "fmuls s17, s8, s0 \n\t"
             "fmacs s17, s11, s1 \n\t"
             "fmacs s17, s14, s2 \n\t"

             //save this into the output
             "fstmias %2!, {s17-s19} \n\t"

             //set the vector depth back to 1
             "fmrx r0, fpscr \n\t"
             "bic r0, r0, #0x00370000 \n\t"
             "orr r0, r0, #0x00000000 \n\t"
             "fmxr fpscr, r0 \n\t"

             //pass  the inputs and set the clobber list
             : "+r"(pA), "+r"(pB), "+r" (pC) :
             :"cc", "memory","s0", "s1", "s2", "s8", "s9", "s10", "s11", "s12", "s13", "s14", "s15", "s16", "s17", "s18", "s19"
             );
#endif
    return C;
}

As far as i can see that makes sence. While debugging i've managed to notice that if i were to say _A = C prior to the return and after the ASM, _A will not necessarily be equal to C which has only increased my confusion. I had thought it was possibly due to the pointers I'm giving to the VFPU being incrimented by lines such as "fldmias %0!, {s0-s2} \n\t" however my understanding of asm is not good enough to properly understand the problem, nor to see an alternative approach to that line of code.

Anyway, I was hoping someone with a greater understanding than me would be able to see a solution, and any help would be greatly appreciated, thank you :-)

Developer IT

VFP Unit Matrix Multiply problem on the iPhone - Developer IT

VFP Unit Matrix Multiply problem on the iPhone

iphone

arm

math

vfp

matrix

Related posts about iphone

iphone - direct link to iPhone review form from inside iphone

Buy iPhone 4 Without Contract: $599 (AT&T) and $699 (Verizon)

Is it possible to download and run IPhone apps on an IPhone emulator

iPhone web app from dashcode rss feed template works deployed on simulator but not on iphone

Problem running iPhone application on iPhone from Xcode (and in Instruments)

Related posts about arm

ARM TechCon 2013: Oracle, ARM expand collaboration on servers, Internet of Things

ARM TechCon 2013: Oracle, ARM expand collaboration on servers, Internet of Things

Xcode warning: application executable contains unsupported architecture(s):arm, arm (-19031)

Files built with a makefile are disapearing (including the binary)

Question about Objective C calling convention and argument passing on ARM

Categories cloud