Search Results

Search found 22300 results on 892 pages for 'half bit'.

Page 68/892 | < Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75  | Next Page >

  • Saturated addition of two signed Java 'long' values

    - by finnw
    How can one add two long values (call them x and y) in Java so that if the result overflows then it is clamped to the range Long.MIN_VALUE..Long.MAX_VALUE? For adding ints one can perform the arithmetic in long precision and cast the result back to an int, e.g.: int saturatedAdd(int x, int y) { long sum = (long) x + (long) y; long clampedSum = Math.max((long) Integer.MIN_VALUE, Math.min(sum, (long) Integer.MAX_VALUE)); return (int) clampedSum; } or import com.google.common.primitives.Ints; int saturatedAdd(int x, int y) { long sum = (long) x + (long) y; return Ints.saturatedCast(sum); } but in the case of long there is no larger primitive type that can hold the intermediate (unclamped) sum. Since this is Java, I cannot use inline assembly (in particular SSE's saturated add instructions.) It can be implemented using BigInteger, e.g. static final BigInteger bigMin = BigInteger.valueOf(Long.MIN_VALUE); static final BigInteger bigMax = BigInteger.valueOf(Long.MAX_VALUE); long saturatedAdd(long x, long y) { BigInteger sum = BigInteger.valueOf(x).add(BigInteger.valueOf(y)); return bigMin.max(sum).min(bigMax).longValue(); } however performance is important so this method is not ideal (though useful for testing.) I don't know whether avoiding branching can significantly affect performance in Java. I assume it can, but I would like to benchmark methods both with and without branching. Related: http://stackoverflow.com/questions/121240/saturating-addition-in-c

    Read the article

  • Shift count negative or too big error - correct solution?

    - by PeterK
    I have the following function for reading a big-endian quadword (in a abstract base file I/O class): unsigned long long CGenFile::readBEq(){ unsigned long long qT = 0; qT |= readb() << 56; qT |= readb() << 48; qT |= readb() << 40; qT |= readb() << 32; qT |= readb() << 24; qT |= readb() << 16; qT |= readb() << 8; qT |= readb() << 0; return qT; } The readb() functions reads a BYTE. Here are the typedefs used: typedef unsigned char BYTE; typedef unsigned short WORD; typedef unsigned long DWORD; The thing is that i get 4 compiler warnings on the first four lines with the shift operation: warning C4293: '<<' : shift count negative or too big, undefined behavior I understand why this warning occurs, but i can't seem to figure out how to get rid of it correctly. I could do something like: qT |= (unsigned long long)readb() << 56; This removes the warning, but isn't there any other problem, will the BYTE be correctly extended all the time? Maybe i'm just thinking about it too much and the solution is that simple. Can you guys help me out here? Thanks.

    Read the article

  • Bitshift in javascript

    - by pingvinus
    I've got a really big number: 5799218898. And want to shift it right to 13 bits. So, windows-calculator or python gives me: 5799218898 13 | 100010100100001110011111100001 13 70791 | 10001010010000111 As expected. But Javascript: 5799218898 13 | 100010100100001110011111100001 13 183624 | 101100110101001000 I think it because of internal integer representation in javascript, but cannot find anything about that.

    Read the article

  • Interface Builder error: IBXMLDecoder: The value for key is too large to fit into a 32 bit integer

    - by stdout
    I'm working with Robert Payne's fork of PSMTabBarControl that works with IB 3.2 (thanks BTW Robert!): http://codaset.com/robertjpayne/psmtabbarcontrol/. The demo application works fine on 64-bit systems, but when I try to open the XIB file in Interface Builder on a 32-bit system I get: IBXMLDecoder: The value (4654500848) for key (myTrackingRectTag) is too large to fit into a 32 bit integer Building the app as 32 bit works, but then running it gives: PSMTabBarControlDemo[9073:80f] * -[NSKeyedUnarchiver decodeInt32ForKey:]: value (4654500848) for key (myTrackingRectTag) too large to fit in 32-bit integer Not sure if this is a generic IB issue that can occur when moving between 64 and 32 bit systems, or if this is a more specific issue with this code. Has anyone else run into this?

    Read the article

  • Negative logical shift

    - by user320862
    In Java, why does -32 -1 = 1 ? It's not specific to just -32. It works for all negative numbers as long as they're not too big. I've found that x -1 = 1 x -2 = 3 x -3 = 7 x -4 = 15 given 0 x some large negative number Isn't -1 the same as << 1? But -32 << 1 = -64. I've read up on two's complements, but still don't understand the reasoning.

    Read the article

  • Better name for CHAR_BIT?

    - by Potatoswatter
    I was just checking an answer and realized that CHAR_BIT isn't defined by headers as I'd expect, not even by #include <bitset>, on newer GCC. Do I really have to #include <climits> just to get the "functionality" of CHAR_BIT?

    Read the article

  • ARM assembly puzzle

    - by ivant
    First of all, I'm not sure if solution even exists. I spent more than a couple of hours trying to come up with one, so beware. The problem: r1 contains an arbitrary integer, flags are not set according to its value. Set r0 to 1 if r1 is 0x80000000, to 0 otherwise, using only two instructions. It's easy to do that in 3 instructions (there are many ways), however doing it in 2 seems very hard, and may very well be impossible.

    Read the article

  • How to calculate 2^n-1 efficiently without overflow?

    - by Ludwig Weinzierl
    I want to calculate 2^n-1 for a 64bit integer value. What I currently do is this for(i=0; i<n; i++) r|=1<<i; and I wonder if there is more elegant way to do it. The line is in an inner loop, so I need it to be fast. I thought of r=(1ULL<<n)-1; but it doesn't work for n=64, because << is only defined for values of n up to 63.

    Read the article

  • Macros to set and clear bits

    - by volting
    Im trying to write a few simple macros to simplify the task of setting and clearing bits which should be a simple task however I cant seem to get them to work correctly. #define SET_BIT(p,n) ((p) |= (1 << (n))) #define CLR_BIT(p,n) ((p) &= (~(1) << (n)))

    Read the article

  • Setting last N bits in an array

    - by Martin
    I'm sure this is fairly simple, however I have a major mental block on it, so I need a little help here! I have an array of 5 integers, the array is already filled with some data. I want to set the last N bits of the array to be random noise. [int][int][int][int][int] set last 40 bits [unchanged][unchanged][unchanged][24 bits of old data followed 8 bits of randomness][all random] This is largely language agnostic, but I'm working in C# so bonus points for answers in C#

    Read the article

  • Why does the right-shift operator produce a zero instead of a one?

    - by mrt181
    Hi, i am teaching myself java and i work through the exercises in Thinking in Java. On page 116, exercise 11, you should right-shift an integer through all its binary positions and display each position with Integer.toBinaryString. public static void main(String[] args) { int i = 8; System.out.println(Integer.toBinaryString(i)); int maxIterations = Integer.toBinaryString(i).length(); int j; for (j = 1; j < maxIterations; j++) { i >>= 1; System.out.println(Integer.toBinaryString(i)); } In the solution guide the output looks like this: 1000 1100 1110 1111 When i run this code i get this: 1000 100 10 1 What is going on here. Are the digits cut off? I am using jdk1.6.0_20 64bit. The book uses jdk1.5 32bit.

    Read the article

  • Proper way to handle issue when porting 32 to 64 bit. Conversion from DT1 to DT2 of greater size

    - by grobartn
    So I am trying to port 32 bit to 64 bit. I have turned on the VS2008 flag for detecting problems with 64 bit. I am trying following: char * pList = (char *)uiTmp); warning C4312: 'type cast' : conversion from 'unsigned int' to 'char *' of greater size Disregard the code itself. This is also true for any pointer, because 64 bit pointer is greater than 32 bit unsigned int or int for that purpose. Given that you have to cast smaller type to greater how would you go about doing it so it correctly on both 32/64 bit systems

    Read the article

  • Overwriting a range of bits in an integer in a generic way

    - by porgarmingduod
    Given two integers X and Y, I want to overwrite bits at position P to P+N. Example: int x = 0xAAAA; // 0b1010101010101010 int y = 0x0C30; // 0b0000110000110000 int result = 0xAC3A; // 0b1010110000111010 Does this procedure have a name? If I have masks, the operation is easy enough: int mask_x = 0xF00F; // 0b1111000000001111 int mask_y = 0x0FF0; // 0b0000111111110000 int result = (x & mask_x) | (y & mask_y); What I can't quite figure out is how to write it in a generic way, such as in the following generic C++ function: template<typename IntType> IntType OverwriteBits(IntType dst, IntType src, int pos, int len) { // If: // dst = 0xAAAA; // 0b1010101010101010 // src = 0x0C30; // 0b0000110000110000 // pos = 4 ^ // len = 8 ^------- // Then: // result = 0xAC3A; // 0b1010110000111010 } The problem is that I cannot figure out how to make the masks properly when all the variables, including the width of the integer, is variable. Does anyone know how to write the above function properly?

    Read the article

< Previous Page | 64 65 66 67 68 69 70 71 72 73 74 75  | Next Page >