Search Results

Search found 1348 results on 54 pages for 'floating accuracy'.

Page 2/54 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • floating point hex octal binary

    - by workinprogress
    Hi, I am working on a calculator that allows you to perform calculations past the decimal point in octal, hexadecimal, binary, and of course decimal. I am having trouble though finding a way to convert floating point decimal numbers to floating point hexadecimal, octal, binary and vice versa. The plan is to do all the math in decimal and then convert the result into the appropriate number system. Any help, ideas or examples would be appreciated. Thanks!

    Read the article

  • C99 and floating point environment

    - by yCalleecharan
    Hi, I was looking at new features of C99 and saw the floating point environment: #include <fenv.h> My question is simple. If I'm performing floating point number computations, do I have to include the above preprocessor directive in my code? If no, then what does this directive do and when does it become important to include? Thanks a lot...

    Read the article

  • Ruby BigDecimal sanity check (floating point newb)

    - by Andy
    Hello, Hoping to get some feedback from someone more experienced here. I haven't dealt with the dreaded floating-point calculation before... Is my understanding correct that with Ruby BigDecimal types (even with varying precision and scale lengths) should calculate accurately or should I anticipate floating point shenanigans? All my values within a Rails application are BigDecimal type and I'm seeing some errors (they do have different decimal lengths), hoping it's just my methods and not my object types... Thanks!

    Read the article

  • floating exception using icc compiler

    - by Hristo
    I'm compiling my code via the following command: icc -ltbb test.cxx -o test Then when I run the program: time ./mp6 100 > output.modified Floating exception 4.871u 0.405s 0:05.28 99.8% 0+0k 0+0io 0pf+0w I get a "Floating exception". This following is code in C++ that I had before the exception and after: // before if (j < E[i]) { temp += foo(0, trr[i], ex[i+j*N]); } // after temp += (j < E[i])*foo(0, trr[i], ex[i+j*N]); This is boolean algebra... so (j < E[i]) is either going to be a 0 or a 1 so the multiplication would result either in 0 or the foo() result. I don't see why this would cause a floating exception. This is what foo() does: int foo(int s, int t, int e) { switch(s % 4) { case 0: return abs(t - e)/e; case 1: return (t == e) ? 0 : 1; case 2: return (t < e) ? 5 : (t - e)/t; case 3: return abs(t - e)/t; } return 0; } foo() isn't a function I wrote so I'm not too sure as to what it does... but I don't think the problem is with the function foo(). Is there something about boolean algebra that I don't understand or something that works differently in C++ than I know of? Any ideas why this causes an exception? Thanks, Hristo

    Read the article

  • What is a Bias Value of Floating Point Numbers?

    - by mudge
    In learning how floating point numbers are represented in computers I have come across the term "bias value" that I do not quite understand. The bias value in floating point numbers has to do with the negative and positiveness of the exponent part of a floating point number. The bias value of a floating point number is 127 which means that 127 is always added to the exponent part of a floating point number. How does doing this help determine if the exponent is negative or positive or not?

    Read the article

  • Floating point conversion from Fixed point algorithm

    - by Viks
    Hi, I have an application which is using 24 bit fixed point calculation.I am porting it to a hardware which does support floating point, so for speed optimization I need to convert all fixed point based calculation to floating point based calculation. For this code snippet, It is calculating mantissa for(i=0;i<8207;i++) { // Do n^8/7 calculation and store // it in mantissa and exponent, scaled to // fixed point precision. } So since this calculation, does convert an integer to mantissa and exponent scaled to fixed point precision(23 bit). When I tried converting it to float, by dividing the mantissa part by precision bits and subtracting the exponent part by precision bit, it really does' t work. Please help suggesting a better way of doing it.

    Read the article

  • Floating point arithmetic is too reliable.

    - by mcoolbeth
    I understand that floating point arithmetic as performed in modern computer systems is not always consistent with real arithmetic. I am trying to contrive a small C# program to demonstrate this. eg: static void Main(string[] args) { double x = 0, y = 0; x += 20013.8; x += 20012.7; y += 10016.4; y += 30010.1; Console.WriteLine("Result: "+ x + " " + y + " " + (x==y)); Console.Write("Press any key to continue . . . "); Console.ReadKey(true); } However, in this case, x and y are equal in the end. Is it possible for me to demonstrate the inconsistency of floating point arithmetic using a program of similar complexity, and without using any really crazy numbers? I would like, if possible, to avoid mathematically correct values that go more than a few places beyond the decimal point.

    Read the article

  • Floating point vs integer calculations on modern hardware

    - by maxpenguin
    I am doing some performance critical work in C++, and we are currently using integer calculations for problems that are inherently floating point because "its faster". This causes a whole lot of annoying problems and adds a lot of annoying code. Now, I remember reading about how floating point calculations were so slow approximately circa the 386 days, where I believe (IIRC) that there was an optional co-proccessor. But surely nowadays with exponentially more complex and powerful CPUs it makes no difference in "speed" if doing floating point or integer calculation? Especially since the actual calculation time is tiny compared to something like causing a pipeline stall or fetching something from main memory? I know the correct answer is to benchmark on the target hardware, what would be a good way to test this? I wrote two tiny C++ programs and compared their run time with "time" on Linux, but the actual run time is too variable (doesn't help I am running on a virtual server). Short of spending my entire day running hundreds of benchmarks, making graphs etc. is there something I can do to get a reasonable test of the relative speed? Any ideas or thoughts? Am I completely wrong? The programs I used as follows, they are not identical by any means: #include <iostream> #include <cmath> #include <cstdlib> #include <time.h> int main( int argc, char** argv ) { int accum = 0; srand( time( NULL ) ); for( unsigned int i = 0; i < 100000000; ++i ) { accum += rand( ) % 365; } std::cout << accum << std::endl; return 0; } Program 2: #include <iostream> #include <cmath> #include <cstdlib> #include <time.h> int main( int argc, char** argv ) { float accum = 0; srand( time( NULL ) ); for( unsigned int i = 0; i < 100000000; ++i ) { accum += (float)( rand( ) % 365 ); } std::cout << accum << std::endl; return 0; } Thanks in advance!

    Read the article

  • liquid CSS issues - rtl, floating and scrollers

    - by Rani
    hi I want to build a site that will have these restrictions: RTL direction vertical scroll on the right side whole page is floating to the right page has 2 columns the right (main) column has min width the right (main) column has table inside it that can expend in its data and get wider making all other data in the column expend to the same width as well the sidebar should be on the left side but still floating to the right of the main div it should fit low resolution so the page will be able to add horizontal scroll if needed should work in all major browsers don't use table for constructing the page Can someone help or direct me? Thanks Rani

    Read the article

  • Fixing Floating Point Error

    - by HannesNZ
    I have some code that gets the leading value (non-zero) of a Double using normal math instead of String Math... For Example: 0.020 would return 2 3.12 would return 3 1000 should return 1 The code I have at the moment is: LeadingValue := Trunc(ResultValue * Power(10, -(Floor(Log10(ResultValue))))) However when ResultValue is 1000 then LeadingValue ends up as 0. What can I do to fix this problem I'm assuming is being caused by floating point errors? Thanks.

    Read the article

  • expand floating object when floating object within expands

    - by Scarface
    Hey guys, quick question, I have a link when clicked drops down a list. This list is floated to the right to position it properly. This list is in another box that has been floated. My problem is that when the list expands, the box does not and the list comes out of the container box, unless the list is not floated. However floating it seems like the only way to get it to the position I want. If anyone has any ideas on how to solve this problem I would appreciate it. .container-box { margin-top:0px; float:left; padding-left:5px; position:relative; } #box-within { float:right; font-weight:bold; max-height:250px; display: none; background-color:#fff; overflow: auto; width:325px; padding:5px; position:relative; }

    Read the article

  • Understanding floating point problems

    - by Maxim Gershkovich
    Could someone here please help me understand how to determine when floating point limitations will cause errors in your calculations. For example the following code. CalculateTotalTax = function (TaxRate, TaxFreePrice) { return ((parseFloat(TaxFreePrice) / 100) * parseFloat(TaxRate)).toFixed(4); }; I have been unable to input any two values that have caused for me an incorrect result for this method. If I remove the toFixed(4) I can infact see where the calculations start to lose accuracy (somewhere around the 6th decimal place). Having said that though, my understanding of floats is that even small numbers can sometimes fail to be represented or have I misunderstood and can 4 decimal places (for example) always be represented accurately. MSDN explains floats as such... This means they cannot hold an exact representation of any quantity that is not a binary fraction (of the form k / (2 ^ n) where k and n are integers) Now I assume this applies to all floats (inlcuding those used in javascript). Fundamentally my question boils down to this. How can one determine if any specific method will be vulnerable to errors in floating point operations, at what precision will those errors materialize and what inputs will be required to produce those errors? Hopefully what I am asking makes sense.

    Read the article

  • Nicely representing a floating-point number in python

    - by dln385
    I want to represent a floating-point number as a string rounded to some number of significant digits, and never using the exponential format. Essentially, I want to display any floating-point number and make sure it “looks nice”. There are several parts to this problem: I need to be able to specify the number of significant digits. The number of significant digits needs to be variable, which can't be done with with the string formatting operator. I need it to be rounded the way a person would expect, not something like 1.999999999999 I've figured out one way of doing this, though it looks like a work-round and it's not quite perfect. (The maximum precision is 15 significant digits.) >>> def f(number, sigfig): return ("%.15f" % (round(number, int(-1 * floor(log10(number)) + (sigfig - 1))))).rstrip("0").rstrip(".") >>> print f(0.1, 1) 0.1 >>> print f(0.0000000000368568, 2) 0.000000000037 >>> print f(756867, 3) 757000 Is there a better way to do this? Why doesn't Python have a built-in function for this?

    Read the article

  • Convert pre-IEEE-574 C++ floating-point numbers to/from C#

    - by Richard Kucia
    Before .Net, before math coprocessors, before IEEE-574, Microsoft defined a bit pattern for floating-point numbers. Old versions of the C++ compiler happily used that definition. I am writing a C# app that needs to read/write such floating-point numbers in a file. How can I do the conversions between the 2 bit formats? I need conversion methods in both directions. This app is going to run in a PocketPC/WinCE environment. Changing the structure of the file is out-of-scope for this project. Is there a C++ compiler option that instructs it to use the old FP format? That would be ideal. I could then exchange data between the C# code and C++ code by using a null-terminated text string, and the C++ methods would be simple wrappers around sprintf and atof functions. At the very least, I'm hoping someone can reply with the bit definitions for the old FP format, so I can put together a low-level bit manipulation algorithm if necessary. Thanks.

    Read the article

  • How to efficiently compare the sign of two floating-point values while handling negative zeros

    - by François Beaune
    Given two floating-point numbers, I'm looking for an efficient way to check if they have the same sign, given that if any of the two values is zero (+0.0 or -0.0), they should be considered to have the same sign. For instance, SameSign(1.0, 2.0) should return true SameSign(-1.0, -2.0) should return true SameSign(-1.0, 2.0) should return false SameSign(0.0, 1.0) should return true SameSign(0.0, -1.0) should return true SameSign(-0.0, 1.0) should return true SameSign(-0.0, -1.0) should return true A naive but correct implementation of SameSign in C++ would be: bool SameSign(float a, float b) { if (fabs(a) == 0.0f || fabs(b) == 0.0f) return true; return (a >= 0.0f) == (b >= 0.0f); } Assuming the IEEE floating-point model, here's a variant of SameSign that compiles to branchless code (at least with with Visual C++ 2008): bool SameSign(float a, float b) { int ia = binary_cast<int>(a); int ib = binary_cast<int>(b); int az = (ia & 0x7FFFFFFF) == 0; int bz = (ib & 0x7FFFFFFF) == 0; int ab = (ia ^ ib) >= 0; return (az | bz | ab) != 0; } with binary_cast defined as follow: template <typename Target, typename Source> inline Target binary_cast(Source s) { union { Source m_source; Target m_target; } u; u.m_source = s; return u.m_target; } I'm looking for two things: A faster, more efficient implementation of SameSign, using bit tricks, FPU tricks or even SSE intrinsics. An efficient extension of SameSign to three values.

    Read the article

  • c++ floating point precision loss: 3015/0.00025298219406977296

    - by SigTerm
    The problem. Microsoft Visual C++ 2005 compiler, 32bit windows xp sp3, amd 64 x2 cpu. Code: double a = 3015.0; double b = 0.00025298219406977296; //*((unsigned __int64*)(&a)) == 0x40a78e0000000000 //*((unsigned __int64*)(&b)) == 0x3f30945640000000 double f = a/b;//3015/0.00025298219406977296; the result of calculation (i.e. "f") is 11917835.000000000 (*((unsigned __int64*)(&f)) == 0x4166bb4160000000) although it should be 11917834.814763514 (i.e. *((unsigned __int64*)(&f)) == 0x4166bb415a128aef). I.e. fractional part is lost. Unfortunately, I need fractional part to be correct. Questions: 1) Why does this happen? 2) How can I fix the problem? Additional info: 0) The result is taken directly from "watch" window (it wasn't printed, and I didn't forget to set printing precision). I also provided hex dump of floating point variable, so I'm absolutely sure about calculation result. 1) The disassembly of f = a/b is: fld qword ptr [a] fdiv qword ptr [b] fstp qword ptr [f] 2) f = 3015/0.00025298219406977296; yields correct result (f == 11917834.814763514 , *((unsigned __int64*)(&f)) == 0x4166bb415a128aef ), but it looks like in this case result is simply calculated during compile-time: fld qword ptr [__real@4166bb415a128aef (828EA0h)] fstp qword ptr [f] So, how can I fix this problem? P.S. I've found a temporary workaround (i need only fractional part of division, so I simply use f = fmod(a/b)/b at the moment), but I still would like to know how to fix this problem properly - double precision is supposed to be 16 decimal digits, so such calculation isn't supposed to cause problems.

    Read the article

  • Why differs floating-point precision in C# when separated by parantheses and when separated by state

    - by Andreas Larsen
    I am aware of how floating point precision works in the regular cases, but I stumbled on an odd situation in my C# code. Why aren't result1 and result2 the exact same floating point value here? const float A; // Arbitrary value const float B; // Arbitrary value float result1 = (A*B)*dt; float result2 = (A*B); result2 *= dt; From this page I figured float arithmetic was left-associative and that this means values are evaluated and calculated in a left-to-right manner. The full source code involves XNA's Quaternions. I don't think it's relevant what my constants are and what the VectorHelper.AddPitchRollYaw() does. The test passes just fine if I calculate the delta pitch/roll/yaw angles in the same manner, but as the code is below it does not pass: X Expected: 0.275153548f But was: 0.275153786f [TestFixture] internal class QuaternionPrecisionTest { [Test] public void Test() { JoystickInput input; input.Pitch = 0.312312432f; input.Roll = 0.512312432f; input.Yaw = 0.912312432f; const float dt = 0.017001f; float pitchRate = input.Pitch * PhysicsConstants.MaxPitchRate; float rollRate = input.Roll * PhysicsConstants.MaxRollRate; float yawRate = input.Yaw * PhysicsConstants.MaxYawRate; Quaternion orient1 = Quaternion.Identity; Quaternion orient2 = Quaternion.Identity; for (int i = 0; i < 10000; i++) { float deltaPitch = (input.Pitch * PhysicsConstants.MaxPitchRate) * dt; float deltaRoll = (input.Roll * PhysicsConstants.MaxRollRate) * dt; float deltaYaw = (input.Yaw * PhysicsConstants.MaxYawRate) * dt; // Add deltas of pitch, roll and yaw to the rotation matrix orient1 = VectorHelper.AddPitchRollYaw( orient1, deltaPitch, deltaRoll, deltaYaw); deltaPitch = pitchRate * dt; deltaRoll = rollRate * dt; deltaYaw = yawRate * dt; orient2 = VectorHelper.AddPitchRollYaw( orient2, deltaPitch, deltaRoll, deltaYaw); } Assert.AreEqual(orient1.X, orient2.X, "X"); Assert.AreEqual(orient1.Y, orient2.Y, "Y"); Assert.AreEqual(orient1.Z, orient2.Z, "Z"); Assert.AreEqual(orient1.W, orient2.W, "W"); } } Granted, the error is small and only presents itself after a large number of iterations, but it has caused me some great headackes.

    Read the article

  • Floating point mantissa bias

    - by user69514
    Does anybody know how to go out solving this problem? * a = 1.0 × 2^9 * b = -1.0 × 2^9 * c = 1.0 × 2^1 Using the floating-point (the representation uses a 14-bit format, 5 bits for the exponent with a bias of 16, a normalized mantissa of 8 bits, and a single sign bit for the number), perform the following two calculations, paying close attention to the order of operations. * b + (a + c) = ? * (b + a) + c = ?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >