Float32 to Float16
- by Goz
Can someone explain to me how I convert a 32-bit floating point value to a 16-bit floating point value?
(s = sign e = exponent and m = mantissa)
If 32-bit float is 1s7e24m
And 16-bit float is 1s5e10m
Then is it as simple as doing?
int fltInt32;
short fltInt16;
memcpy( &fltInt32, &flt, sizeof( float ) );
fltInt16 = (fltInt32 & 0x00FFFFFF) >> 14;
fltInt16 |= ((fltInt32 & 0x7f000000) >> 26) << 10;
fltInt16 |= ((fltInt32 & 0x80000000) >> 16);
I'm assuming it ISN'T that simple ... so can anyone tell me what you DO need to do?