Search Results

Search found 4580 results on 184 pages for 'faster'.

Page 21/184 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28  | Next Page >

  • Should not a tail-recursive function also be faster?

    - by Balint Erdi
    I have the following Clojure code to calculate a number with a certain "factorable" property. (what exactly the code does is secondary). (defn factor-9 ([] (let [digits (take 9 (iterate #(inc %) 1)) nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))] (some (fn [x] (and (factor-9 x) x)) nums))) ([n] (or (= 1 (count (str n))) (and (divisible-by-length n) (factor-9 (quot n 10)))))) Now, I'm into TCO and realize that Clojure can only provide tail-recursion if explicitly told so using the recur keyword. So I've rewritten the code to do that (replacing factor-9 with recur being the only difference): (defn factor-9 ([] (let [digits (take 9 (iterate #(inc %) 1)) nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))] (some (fn [x] (and (factor-9 x) x)) nums))) ([n] (or (= 1 (count (str n))) (and (divisible-by-length n) (recur (quot n 10)))))) To my knowledge, TCO has a double benefit. The first one is that it does not use the stack as heavily as a non tail-recursive call and thus does not blow it on larger recursions. The second, I think is that consequently it's faster since it can be converted to a loop. Now, I've made a very rough benchmark and have not seen any difference between the two implementations although. Am I wrong in my second assumption or does this have something to do with running on the JVM (which does not have automatic TCO) and recur using a trick to achieve it? Thank you.

    Read the article

  • PHP Magic faster than simply setting the class attribute?

    - by Marc Trudel
    Well, not exactly that, but here is an example. Can anyone explain the difference between B and C? How can it be faster to use a magic function to dynamically set a value instead of simply setting the value in the attribute definition? Here is some code: [root@vm-202-167-238-17 ~]# cat test.php; for d in A B C; do echo "------"; ./test.php $d; done; #!/usr/bin/php <?php $className = $argv[1]; class A { public function __get($a) { return 5; } } class B { public $a = 5; } class C { public function __get($a) { $this->a = 5; return 5; } } $a = new $className; $start = microtime(true); for ($i=0; $i < 1000000; $i++) $b = $a->a; $end = microtime(true); echo (($end - $start) * 1000) ." msec\n"; ------ 598.90794754028 msec ------ 205.48391342163 msec ------ 189.7759437561 msec

    Read the article

  • convert an int to list of individual digitals more faster?

    - by user478514
    All, I want define an int(987654321) <= [9, 8, 7, 6, 5, 4, 3, 2, 1] convertor, if the length of int number < 9, for example 10 the list will be [0,0,0,0,0,0,0,1,0] , and if the length 9, for example 9987654321 , the list will be [9, 9, 8, 7, 6, 5, 4, 3, 2, 1] >>> i 987654321 >>> l [9, 8, 7, 6, 5, 4, 3, 2, 1] >>> z = [0]*(len(unit) - len(str(l))) >>> z.extend(l) >>> l = z >>> unit [100000000, 10000000, 1000000, 100000, 10000, 1000, 100, 10, 1] >>> sum([x*y for x,y in zip(l, unit)]) 987654321 >>> int("".join([str(x) for x in l])) 987654321 >>> l1 = [int(x) for x in str(i)] >>> z = [0]*(len(unit) - len(str(l1))) >>> z.extend(l1) >>> l1 = z >>> l1 [9, 8, 7, 6, 5, 4, 3, 2, 1] >>> a = [i//x for x in unit] >>> b = [a[x] - a[x-1]*10 for x in range(9)] >>> if len(b) = len(a): b[0] = a[0] # fix the a[-1] issue >>> b [9, 8, 7, 6, 5, 4, 3, 2, 1] I tested above solutions but found those may not faster/simple enough than I want and may have a length related bug inside, anyone may share me a better solution for this kinds convertion? Thanks!

    Read the article

  • What runs faster? Wordpress or Drupal 6.x?

    - by electblake
    So... I run a pretty large Wordpress blog. Currently it gets around 20k+ pageviews a day, and its always a struggle to keep the bad boy running quickly - I currently run a vps.net with CentOS 5.3 I am also Drupal developer by trade so I love the CMS Framework for its versatility and the portability (I can take work from one site and implement on another with great ease) MY QUESTION IS: What is faster then? Wordpress 3.x & Drupal 6.x I'd love to migrate my site to Drupal to be able to roll out new features etc (which I find awkward to do in Wordpress) but I am scared that Drupal may not be able to handle the traffic. Any opinions? I know that some major players use Drupal - as Dries documents well on his blog but I'm not under any illusions that Drupal can be a real hog. Thanks for any/all help! Please try to avoid server optimization talk unless it pertains to Wordpress or Drupal 6.x specifically, I love to learn more about optimizations but I do want to sort out which platform is quicker :) p.s - I realize the fastest option is to use a lower-level framework (with less overhead) like CakePHP etc but assume that isn't an option ;)

    Read the article

  • Which is faster: Appropriate data input or appropriate data structure?

    - by Anon
    I have a dataset whose columns look like this: Consumer ID | Product ID | Time Period | Product Score 1 | 1 | 1 | 2 2 | 1 | 2 | 3 and so on. As part of a program (written in C) I need to process the product scores given by all consumers for a particular product and time period combination for all possible combinations. Suppose that there are 3 products and 2 time periods. Then I need to process the product scores for all possible combinations as shown below: Product ID | Time Period 1 | 1 1 | 2 2 | 1 2 | 2 3 | 1 3 | 2 I will need to process the data along the above lines lots of times ( 10k) and the dataset is fairly large (e.g., 48k consumers, 100 products, 24 time periods etc). So speed is an issue. I came up with two ways to process the data and am wondering which is the faster approach or perhaps it does not matter much? (speed matters but not at the cost of undue maintenance/readability): Sort the data on product id and time period and then loop through the data to extract data for all possible combinations. Store the consumer ids of all consumers who provided product scores for a particular combination of product id and time period and process the data accordingly. Any thoughts? Any other way to speed up the processing? Thanks

    Read the article

  • Phonegap: Will my mobile app 'feel' faster or slower once ported to phonegap?

    - by user15872
    So I'm designing everything in mobile Safari and I know that phonegap is essentially a stripped webview but... Question: Will my application will run better in phonegap? (revised below) a)I imagine my navigation and core app will load faster as the scripts and images are on the hard drive. Is this True? b)I assume since they've been working on it for 2 years now that they may have made some optimizations to make it quicker than just an average safari window. Is this true? (Assuming both html5/js/css code bases are pretty much the same and app is running on iOS.) Update: Sorry, I meant to compare apples to slightly different apples. Question 1 revised: Will my app see any performance benefits running with in a phonegap environment vs standard mobile safari? (compare mobile - to mobile) 1b) In what ways, other than loading time has phonegap optimized performance over standard mobile safari? Follow ups: 1) Are there any pitfalls, other than large libraries, that may cause phonegap to suffer a serious performance hit vs stand mobile safari? 2) Can I mix native and webview rendering? (i.e the top half of my app is rendered in with html/css/js and the bottom half native)

    Read the article

  • F# why my recursion is faster than Seq.exists?

    - by user38397
    I am pretty new to F#. I'm trying to understand how I can get a fast code in F#. For this, I tried to write two methods (IsPrime1 and IsPrime2) for benchmarking. My code is: // Learn more about F# at http://fsharp.net open System open System.Diagnostics #light let isDivisible n d = n % d = 0 let IsPrime1 n = Array.init (n-2) ((+) 2) |> Array.exists (isDivisible n) |> not let rec hasDivisor n d = match d with | x when x < n -> (n % x = 0) || (hasDivisor n (d+1)) | _ -> false let IsPrime2 n = hasDivisor n 2 |> not let SumOfPrimes max = [|2..max|] |> Array.filter IsPrime1 |> Array.sum let maxVal = 20000 let s = new Stopwatch() s.Start() let valOfSum = SumOfPrimes maxVal s.Stop() Console.WriteLine valOfSum Console.WriteLine("IsPrime1: {0}", s.ElapsedMilliseconds) ////////////////////////////////// s.Reset() s.Start() let SumOfPrimes2 max = [|2..max|] |> Array.filter IsPrime2 |> Array.sum let valOfSum2 = SumOfPrimes2 maxVal s.Stop() Console.WriteLine valOfSum2 Console.WriteLine("IsPrime2: {0}", s.ElapsedMilliseconds) Console.ReadKey() IsPrime1 takes 760 ms while IsPrime2 takes 260ms for the same result. What's going on here and how I can make my code even faster?

    Read the article

  • Why is drawing to OnPaint graphics faster than image graphics?

    - by Tesserex
    I'm looking for a way to speed up the drawing of my game engine, which is currently the significant bottleneck, and is causing slowdowns. I'm on the verge of converting it over to XNA, but I just noticed something. Say I have a small image that I've loaded. Image img = Image.FromFile("mypict.png"); We have a picturebox on the screen we want to draw on. So we have a handler. pictureBox1.Paint += new PaintEventHandler(pictureBox1_Paint); I want our loaded image to be tiled on the picturebox (this is for a game, after all). Why on earth is this code: void pictureBox1_Paint(object sender, PaintEventArgs e) { for (int y = 0; y < 16; y++) for (int x = 0; x < 16; x++) e.Graphics.DrawImage(image, x * 16, y * 16, 16, 16); } over 25 TIMES FASTER than this code: Image buff = new Bitmap(256, 256, PixelFormat.Format32bppPArgb); // actually a form member void pictureBox1_Paint(object sender, PaintEventArgs e) { using (Graphics g = Graphics.FromImage(buff)) { for (int y = 0; y < 16; y++) for (int x = 0; x < 16; x++) g.DrawImage(image, x * 16, y * 16, 16, 16); } e.Graphics.DrawImage(buff, 0, 0, 256, 256); } To eliminate the obvious, I've tried commenting out the last e.Graphics.DrawImage (which means I don't see anything, but it gets rid a call that isn't in the first example). I've also left in the using block (needlessly) in the first example, but it's still just as blazingly fast. I've set properties of g to match e.Graphics - things like InterpolationMode, CompositingQuality, etc, but nothing I do bridges this incredible gap in performance. I can't find any difference between the two Graphics objects. What gives? My test with a System.Diagnostics.Stopwatch says that the first code snippet runs at about 7100 fps, while the second runs at a measly 280 fps. My reference image is VS2010ImageLibrary\Objects\png_format\WinVista\SecurityLock.png, which is 48x48 px, and which I modified to be 72 dpi instead of 96, but those made no difference either.

    Read the article

  • Why is numpy's einsum faster than numpy's built in functions?

    - by Ophion
    Lets start with three arrays of dtype=np.double. Timings are performed on a intel CPU using numpy 1.7.1 compiled with icc and linked to intel's mkl. A AMD cpu with numpy 1.6.1 compiled with gcc without mkl was also used to verify the timings. Please note the timings scale nearly linearly with system size and are not due to the small overhead incurred in the numpy functions if statements these difference will show up in microseconds not milliseconds: arr_1D=np.arange(500,dtype=np.double) large_arr_1D=np.arange(100000,dtype=np.double) arr_2D=np.arange(500**2,dtype=np.double).reshape(500,500) arr_3D=np.arange(500**3,dtype=np.double).reshape(500,500,500) First lets look at the np.sum function: np.all(np.sum(arr_3D)==np.einsum('ijk->',arr_3D)) True %timeit np.sum(arr_3D) 10 loops, best of 3: 142 ms per loop %timeit np.einsum('ijk->', arr_3D) 10 loops, best of 3: 70.2 ms per loop Powers: np.allclose(arr_3D*arr_3D*arr_3D,np.einsum('ijk,ijk,ijk->ijk',arr_3D,arr_3D,arr_3D)) True %timeit arr_3D*arr_3D*arr_3D 1 loops, best of 3: 1.32 s per loop %timeit np.einsum('ijk,ijk,ijk->ijk', arr_3D, arr_3D, arr_3D) 1 loops, best of 3: 694 ms per loop Outer product: np.all(np.outer(arr_1D,arr_1D)==np.einsum('i,k->ik',arr_1D,arr_1D)) True %timeit np.outer(arr_1D, arr_1D) 1000 loops, best of 3: 411 us per loop %timeit np.einsum('i,k->ik', arr_1D, arr_1D) 1000 loops, best of 3: 245 us per loop All of the above are twice as fast with np.einsum. These should be apples to apples comparisons as everything is specifically of dtype=np.double. I would expect the speed up in an operation like this: np.allclose(np.sum(arr_2D*arr_3D),np.einsum('ij,oij->',arr_2D,arr_3D)) True %timeit np.sum(arr_2D*arr_3D) 1 loops, best of 3: 813 ms per loop %timeit np.einsum('ij,oij->', arr_2D, arr_3D) 10 loops, best of 3: 85.1 ms per loop Einsum seems to be at least twice as fast for np.inner, np.outer, np.kron, and np.sum regardless of axes selection. The primary exception being np.dot as it calls DGEMM from a BLAS library. So why is np.einsum faster that other numpy functions that are equivalent? The DGEMM case for completeness: np.allclose(np.dot(arr_2D,arr_2D),np.einsum('ij,jk',arr_2D,arr_2D)) True %timeit np.einsum('ij,jk',arr_2D,arr_2D) 10 loops, best of 3: 56.1 ms per loop %timeit np.dot(arr_2D,arr_2D) 100 loops, best of 3: 5.17 ms per loop The leading theory is from @sebergs comment that np.einsum can make use of SSE2, but numpy's ufuncs will not until numpy 1.8 (see the change log). I believe this is the correct answer, but have not been able to confirm it. Some limited proof can be found by changing the dtype of input array and observing speed difference and the fact that not everyone observes the same trends in timings.

    Read the article

  • Is there a faster way to launch Activity on Android when using maven?

    - by Kamilski81
    Within Eclipse, everything works perfectly when I run 'mvn install android:deploy'...however, this takes about 18 seconds to complete. Is there a faster way to launch my android application. When I try to run my main Activity via 'Android Application' I get a huge stack trace: 09-05 14:03:09.915: E/AndroidRuntime(689): FATAL EXCEPTION: main 09-05 14:03:09.915: E/AndroidRuntime(689): java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{com.soraapps.android.purseprideapp/com.soraapps.android.purseprideapp.PursePrideActivity}: java.lang.ClassNotFoundException: com.soraapps.android.purseprideapp.PursePrideActivity in loader dalvik.system.PathClassLoader[/data/app/com.soraapps.android.purseprideapp-2.apk] 09-05 14:03:09.915: E/AndroidRuntime(689): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1569) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:1663) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.app.ActivityThread.access$1500(ActivityThread.java:117) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.app.ActivityThread$H.handleMessage(ActivityThread.java:931) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.os.Handler.dispatchMessage(Handler.java:99) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.os.Looper.loop(Looper.java:123) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.app.ActivityThread.main(ActivityThread.java:3683) 09-05 14:03:09.915: E/AndroidRuntime(689): at java.lang.reflect.Method.invokeNative(Native Method) 09-05 14:03:09.915: E/AndroidRuntime(689): at java.lang.reflect.Method.invoke(Method.java:507) 09-05 14:03:09.915: E/AndroidRuntime(689): at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839) 09-05 14:03:09.915: E/AndroidRuntime(689): at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597) 09-05 14:03:09.915: E/AndroidRuntime(689): at dalvik.system.NativeStart.main(Native Method) 09-05 14:03:09.915: E/AndroidRuntime(689): Caused by: java.lang.ClassNotFoundException: com.soraapps.android.purseprideapp.PursePrideActivity in loader dalvik.system.PathClassLoader[/data/app/com.soraapps.android.purseprideapp-2.apk] 09-05 14:03:09.915: E/AndroidRuntime(689): at dalvik.system.PathClassLoader.findClass(PathClassLoader.java:240) 09-05 14:03:09.915: E/AndroidRuntime(689): at java.lang.ClassLoader.loadClass(ClassLoader.java:551) 09-05 14:03:09.915: E/AndroidRuntime(689): at java.lang.ClassLoader.loadClass(ClassLoader.java:511) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.app.Instrumentation.newActivity(Instrumentation.java:1021) 09-05 14:03:09.915: E/AndroidRuntime(689): at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:1561) 09-05 14:03:09.915: E/AndroidRuntime(689): ... 11 more Here is my pom.xml: https://gist.github.com/3656482 And here is what my files look like after I try building and running the project. (see gen and bin folders) http://cl.ly/image/3Q0x052S2Z3Q

    Read the article

  • Why do UInt16 arrays seem to add faster than int arrays?

    - by scraimer
    It seems that C# is faster at adding two arrays of UInt16[] than it is at adding two arrays of int[]. This makes no sense to me, since I would have assumed the arrays would be word-aligned, and thus int[] would require less work from the CPU, no? I ran the test-code below, and got the following results: Int for 1000 took 9896625613 tick (4227 msec) UInt16 for 1000 took 6297688551 tick (2689 msec) The test code does the following: Creates two arrays named a and b, once. Fills them with random data, once. Starts a stopwatch. Adds a and b, item-by-item. This is done 1000 times. Stops the stopwatch. Reports how long it took. This is done for int[] a, b and for UInt16 a,b. And every time I run the code, the tests for the UInt16 arrays take 30%-50% less time than the int arrays. Can you explain this to me? Here's the code, if you want to try if for yourself: public static UInt16[] GenerateRandomDataUInt16(int length) { UInt16[] noise = new UInt16[length]; Random random = new Random((int)DateTime.Now.Ticks); for (int i = 0; i < length; ++i) { noise[i] = (UInt16)random.Next(); } return noise; } public static int[] GenerateRandomDataInt(int length) { int[] noise = new int[length]; Random random = new Random((int)DateTime.Now.Ticks); for (int i = 0; i < length; ++i) { noise[i] = (int)random.Next(); } return noise; } public static int[] AddInt(int[] a, int[] b) { int len = a.Length; int[] result = new int[len]; for (int i = 0; i < len; ++i) { result[i] = (int)(a[i] + b[i]); } return result; } public static UInt16[] AddUInt16(UInt16[] a, UInt16[] b) { int len = a.Length; UInt16[] result = new UInt16[len]; for (int i = 0; i < len; ++i) { result[i] = (ushort)(a[i] + b[i]); } return result; } public static void Main() { int count = 1000; int len = 128 * 6000; int[] aInt = GenerateRandomDataInt(len); int[] bInt = GenerateRandomDataInt(len); Stopwatch s = new Stopwatch(); s.Start(); for (int i=0; i<count; ++i) { int[] resultInt = AddInt(aInt, bInt); } s.Stop(); Console.WriteLine("Int for " + count + " took " + s.ElapsedTicks + " tick (" + s.ElapsedMilliseconds + " msec)"); UInt16[] aUInt16 = GenerateRandomDataUInt16(len); UInt16[] bUInt16 = GenerateRandomDataUInt16(len); s = new Stopwatch(); s.Start(); for (int i=0; i<count; ++i) { UInt16[] resultUInt16 = AddUInt16(aUInt16, bUInt16); } s.Stop(); Console.WriteLine("UInt16 for " + count + " took " + s.ElapsedTicks + " tick (" + s.ElapsedMilliseconds + " msec)"); }

    Read the article

  • Matrix Multiplication with Threads: Why is it not faster?

    - by prelic
    Hey all, So I've been playing around with pthreads, specifically trying to calculate the product of two matrices. My code is extremely messy because it was just supposed to be a quick little fun project for myself, but the thread theory I used was very similar to: #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define M 3 #define K 2 #define N 3 #define NUM_THREADS 10 int A [M][K] = { {1,4}, {2,5}, {3,6} }; int B [K][N] = { {8,7,6}, {5,4,3} }; int C [M][N]; struct v { int i; /* row */ int j; /* column */ }; void *runner(void *param); /* the thread */ int main(int argc, char *argv[]) { int i,j, count = 0; for(i = 0; i < M; i++) { for(j = 0; j < N; j++) { //Assign a row and column for each thread struct v *data = (struct v *) malloc(sizeof(struct v)); data->i = i; data->j = j; /* Now create the thread passing it data as a parameter */ pthread_t tid; //Thread ID pthread_attr_t attr; //Set of thread attributes //Get the default attributes pthread_attr_init(&attr); //Create the thread pthread_create(&tid,&attr,runner,data); //Make sure the parent waits for all thread to complete pthread_join(tid, NULL); count++; } } //Print out the resulting matrix for(i = 0; i < M; i++) { for(j = 0; j < N; j++) { printf("%d ", C[i][j]); } printf("\n"); } } //The thread will begin control in this function void *runner(void *param) { struct v *data = param; // the structure that holds our data int n, sum = 0; //the counter and sum //Row multiplied by column for(n = 0; n< K; n++){ sum += A[data->i][n] * B[n][data->j]; } //assign the sum to its coordinate C[data->i][data->j] = sum; //Exit the thread pthread_exit(0); } source: http://macboypro.com/blog/2009/06/29/matrix-multiplication-in-c-using-pthreads-on-linux/ For the non-threaded version, I used the same setup (3 2-d matrices, dynamically allocated structs to hold r/c), and added a timer. First trials indicated that the non-threaded version was faster. My first thought was that the dimensions were too small to notice a difference, and it was taking longer to create the threads. So I upped the dimensions to about 50x50, randomly filled, and ran it, and I'm still not seeing any performance upgrade with the threaded version. What am I missing here?

    Read the article

  • How to save the content in UIWebView for faster loading on next launch?

    - by erotsppa
    I know that there are some caching classes introduced in the iphone sdk recently, and there is also a TTURLRequest from three20's library that allows you to cache a request to a URL. However, because I am loading the web page in UIWebView by calling UIWebView's loadRequest, those techniques are not really applicable. Any ideas how I can save a web page so that on next app launch, I don't have to fetch from the web again for the full page? The page itself already have some ajax mechanism that updates parts of itself automatically.

    Read the article

  • Why is setting HTML5's CanvasPixelArray values ridiculously slow and how can I do it faster?

    - by Nixuz
    I am trying to do some dynamic visual effects using the HTML 5 canvas' pixel manipulation, but I am running into a problem where setting pixels in the CanvasPixelArray is ridiculously slow. For example if I have code like: imageData = ctx.getImageData(0, 0, 500, 500); for (var i = 0; i < imageData.length; i += 4){ imageData.data[i] = buffer[i]; imageData.data[i + 1] = buffer[i + 1]; imageData.data[i + 2] = buffer[i + 2]; } ctx.putImageData(imageData, 0, 0); Profiling with Chrome reveals, it runs 44% slower than the following code where CanvasPixelArray is not used. tempArray = new Array(500 * 500 * 4); imageData = ctx.getImageData(0, 0, 500, 500); for (var i = 0; i < imageData.length; i += 4){ tempArray[i] = buffer[i]; tempArray[i + 1] = buffer[i + 1]; tempArray[i + 2] = buffer[i + 2]; } ctx.putImageData(imageData, 0, 0); My guess is that the reason for this slowdown is due to the conversion between the Javascript doubles and the internal unsigned 8bit integers, used by the CanvasPixelArray. Is this guess correct? Is there anyway to reduce the time spent setting values in the CanvasPixelArray?

    Read the article

  • How to open DataSet in Visual Studio 2008 faster?

    - by Ekkapop
    When I open DataSet in Visual Studio 2008 to design or modify it, it always take a very long time (more than five minutes) before I can continue to do my job. While I'm waiting I can't do anything on Visual Studio, moreover CPU and memory usage is growth dramatically. I want to know, Is it has anyway to reduce this waiting time? Hardware - Desktop CPU: Intel Q6600 Memory: 4 GB HDD: 320 GB 7200 rpm OS: Windows XP 32 bit with Service Pack 3

    Read the article

  • My OpenCL kernel is slower on faster hardware.. But why?

    - by matdumsa
    Hi folks, As I was finishing coding my project for a multicore programming class I came up upon something really weird I wanted to discuss with you. We were asked to create any program that would show significant improvement in being programmed for a multi-core platform. I’ve decided to try and code something on the GPU to try out OpenCL. I’ve chosen the matrix convolution problem since I’m quite familiar with it (I’ve parallelized it before with open_mpi with great speedup for large images). So here it is, I select a large GIF file (2.5 MB) [2816X2112] and I run the sequential version (original code) and I get an average of 15.3 seconds. I then run the new OpenCL version I just wrote on my MBP integrated GeForce 9400M and I get timings of 1.26s in average.. So far so good, it’s a speedup of 12X!! But now I go in my energy saver panel to turn on the “Graphic Performance Mode” That mode turns off the GeForce 9400M and turns on the Geforce 9600M GT my system has. Apple says this card is twice as fast as the integrated one. Guess what, my timing using the kick-ass graphic card are 3.2 seconds in average… My 9600M GT seems to be more than two times slower than the 9400M.. For those of you that are OpenCL inclined, I copy all data to remote buffers before starting, so the actual computation doesn’t require roundtrip to main ram. Also, I let OpenCL determine the optimal local-worksize as I’ve read they’ve done a pretty good implementation at figuring that parameter out.. Anyone has a clue? edit: full source code with makefiles here http://www.mathieusavard.info/convolution.zip cd gimage make cd ../clconvolute make put a large input.gif in clconvolute and run it to see results

    Read the article

  • Is LuaJIT really faster than every other JIT-ed dynamic languages?

    - by Gabriel Cuvillier
    According to the computer language benchmark game, the LuaJIT implementation seems to beat every other JIT-ed dynamic language (V8, Tracemonkey, PLT Scheme, Erlang HIPE) by an order of magnitude. I know that these benchmarks are not representative (as they say: "Which programming language implementations have the fastest benchmark programs?"), but this is still really impressive. In practice, is it really the case? Someone have tested that Lua implementation?

    Read the article

  • Is there a faster alternative to using textures in XNA?

    - by Matthew Bowen
    I am writing a map editing program for a 2D game using XNA. To create a Texture2D for all of the tiles that a map requires takes too long. Are there any alternatives to using textures for drawing with XNA? I attempted to create just one texture per tile set instead of a texture for every tile in a tile set, but there is a limit to the size of textures and I could not fit all the tiles of a tile set into one texture. Currently the program contains all the would-be textures in memory as Bitmap objects. Is there a way to simply draw a Bitmap object to the screen in XNA? I have searched but I cannot find any information on this. This approach would avoid having to create textures altogether, however any tinting or effects I would have to do to the bitmap directly. Any help would be very much appreciated. Thanks

    Read the article

  • How can I make this Java code run faster?

    - by Martin Wiboe
    Hello all, I am trying to make a Java port of a simple feed-forward neural network. This obviously involves lots of numeric calculations, so I am trying to optimize my central loop as much as possible. The results should be correct within the limits of the float data type. My current code looks as follows (error handling & initialization removed): /** * Simple implementation of a feedforward neural network. The network supports * including a bias neuron with a constant output of 1.0 and weighted synapses * to hidden and output layers. * * @author Martin Wiboe */ public class FeedForwardNetwork { private final int outputNeurons; // No of neurons in output layer private final int inputNeurons; // No of neurons in input layer private int largestLayerNeurons; // No of neurons in largest layer private final int numberLayers; // No of layers private final int[] neuronCounts; // Neuron count in each layer, 0 is input // layer. private final float[][][] fWeights; // Weights between neurons. // fWeight[fromLayer][fromNeuron][toNeuron] // is the weight from fromNeuron in // fromLayer to toNeuron in layer // fromLayer+1. private float[][] neuronOutput; // Temporary storage of output from previous layer public float[] compute(float[] input) { // Copy input values to input layer output for (int i = 0; i < inputNeurons; i++) { neuronOutput[0][i] = input[i]; } // Loop through layers for (int layer = 1; layer < numberLayers; layer++) { // Loop over neurons in the layer and determine weighted input sum for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) { // Bias neuron is the last neuron in the previous layer int biasNeuron = neuronCounts[layer - 1]; // Get weighted input from bias neuron - output is always 1.0 float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron]; // Get weighted inputs from rest of neurons in previous layer for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) { activation += neuronOutput[layer-1][inputNeuron] * fWeights[layer - 1][inputNeuron][neuron]; } // Store neuron output for next round of computation neuronOutput[layer][neuron] = sigmoid(activation); } } // Return output from network = output from last layer float[] result = new float[outputNeurons]; for (int i = 0; i < outputNeurons; i++) result[i] = neuronOutput[numberLayers - 1][i]; return result; } private final static float sigmoid(final float input) { return (float) (1.0F / (1.0F + Math.exp(-1.0F * input))); } } I am running the JVM with the -server option, and as of now my code is between 25% and 50% slower than similar C code. What can I do to improve this situation? Thank you, Martin Wiboe

    Read the article

  • Why is setting HTML5's CanvasPixelArray values is ridiculously slow and how can I do it faster?

    - by Nixuz
    I am trying to do some dynamic visual effects using the HTML 5 canvas' pixel manipulation, but I am running into a problem where setting pixels in the CanvasPixelArray is ridiculously slow. For example if I have code like: imageData = ctx.getImageData(0, 0, 500, 500); for (var i = 0; i < imageData.length; i += 4){ imageData.data[index] = buffer[i]; imageData.data[index + 1] = buffer[i]; imageData.data[index + 2] = buffer[i]; } ctx.putImageData(imageData, 0, 0); Profiling with Chrome reveals, it runs 44% slower than the following code where CanvasPixelArray is not used. tempArray = new Array(500 * 500 * 4); imageData = ctx.getImageData(0, 0, 500, 500); for (var i = 0; i < imageData.length; i += 4){ tempArray[index] = buffer[i]; tempArray[index + 1] = buffer[i]; tempArray[index + 2] = buffer[i]; } ctx.putImageData(imageData, 0, 0); My guess is that the reason for this slowdown is due to the conversion between the Javascript doubles and the internal unsigned 8bit integers, used by the CanvasPixelArray. Is this guess correct? Is there anyway to reduce the time spent setting values in the CanvasPixelArray?

    Read the article

  • Whats faster in Javascript a bunch of small setInterval loops, or one big one?

    - by RobertWHurst
    Just wondering if its worth it to make a monolithic loop function or just add loops were they're needed. The big loop option would just be a loop of callbacks that are added dynamically with an add function. adding a function would look like this setLoop(function(){ alert('hahaha! I\'m a really annoying loop that bugs you every tenth of a second'); }); setLoop would add the function to the monolithic loop. so is the is worth anything in performance or should I just stick to lots of little loops using setInterval?

    Read the article

  • Java object caching, which is faster, reading from a file or from a remote machine?

    - by Kumar225
    I am at a point where I need to take the decision on what to do when caching of objects reaches the configured threshold. Should I store the objects in a indexed file (like provided by JCS) and read them from the file (file IO) when required or have the object stored in a distributed cache (network, serialization, deserialization) We are using Solaris as OS. ============================ Adding some more information. I have this question so as to determine if I can switch to distributed caching. The remote server which will have cache will have more memory and better disk and this remote server will only be used for caching. One of the problems we cannot increase the locally cached objects is , it stores the cached objects in JVM heap which has limited memory(using 32bit JVM). ======================================================================== Thanks, we finally ended up choosing Coherence as our Cache product. This provides many cache configuration topologies, in process vs remote vs disk ..etc.

    Read the article

< Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28  | Next Page >