add uchar values in ushort array with sse2 or sse3

Posted by pompolus on Stack Overflow See other posts from Stack Overflow or by pompolus
Published on 2012-11-09T18:06:59Z Indexed on 2012/11/10 11:01 UTC
Read the original article Hit count: 222

Filed under:
|
|
|
|

i have an unsigned short dst[16][16] matrix and a larger unsigned char src[m][n] matrix.

Now i have to access in the src matrix and add a 16x16 submatrix to dst, using sse2 or ss3.

In a my older implementation, I was sure that my summed values ??were never greater than 256, so i could do this:

for (int row = 0; row < 16; ++row)
  {
    __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src));
    dst[row] = _mm_add_epi8(dst[row], subMat);
    src += W; // Step to next row i need to add
  }

where W is an offset to reach the desired rows. This code works, but now my values in src are larger and summed could be greater than 256, so i need to store them as ushort.

i've tried this:

for (int row = 0; row < 16; ++row)
  {
    __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src));
    dst[row] = _mm_add_epi16(dst[row], subMat);
    src += W; // Step to next row i need to add
  }

but it doesn't work. I'm not so good with sse, so any help will be appreciated.

© Stack Overflow or respective owner

Related posts about c++

Related posts about c