Search Results

Search found 5572 results on 223 pages for 'cpu'.

Page 120/223 | < Previous Page | 116 117 118 119 120 121 122 123 124 125 126 127  | Next Page >

  • Why is numpy's einsum faster than numpy's built in functions?

    - by Ophion
    Lets start with three arrays of dtype=np.double. Timings are performed on a intel CPU using numpy 1.7.1 compiled with icc and linked to intel's mkl. A AMD cpu with numpy 1.6.1 compiled with gcc without mkl was also used to verify the timings. Please note the timings scale nearly linearly with system size and are not due to the small overhead incurred in the numpy functions if statements these difference will show up in microseconds not milliseconds: arr_1D=np.arange(500,dtype=np.double) large_arr_1D=np.arange(100000,dtype=np.double) arr_2D=np.arange(500**2,dtype=np.double).reshape(500,500) arr_3D=np.arange(500**3,dtype=np.double).reshape(500,500,500) First lets look at the np.sum function: np.all(np.sum(arr_3D)==np.einsum('ijk->',arr_3D)) True %timeit np.sum(arr_3D) 10 loops, best of 3: 142 ms per loop %timeit np.einsum('ijk->', arr_3D) 10 loops, best of 3: 70.2 ms per loop Powers: np.allclose(arr_3D*arr_3D*arr_3D,np.einsum('ijk,ijk,ijk->ijk',arr_3D,arr_3D,arr_3D)) True %timeit arr_3D*arr_3D*arr_3D 1 loops, best of 3: 1.32 s per loop %timeit np.einsum('ijk,ijk,ijk->ijk', arr_3D, arr_3D, arr_3D) 1 loops, best of 3: 694 ms per loop Outer product: np.all(np.outer(arr_1D,arr_1D)==np.einsum('i,k->ik',arr_1D,arr_1D)) True %timeit np.outer(arr_1D, arr_1D) 1000 loops, best of 3: 411 us per loop %timeit np.einsum('i,k->ik', arr_1D, arr_1D) 1000 loops, best of 3: 245 us per loop All of the above are twice as fast with np.einsum. These should be apples to apples comparisons as everything is specifically of dtype=np.double. I would expect the speed up in an operation like this: np.allclose(np.sum(arr_2D*arr_3D),np.einsum('ij,oij->',arr_2D,arr_3D)) True %timeit np.sum(arr_2D*arr_3D) 1 loops, best of 3: 813 ms per loop %timeit np.einsum('ij,oij->', arr_2D, arr_3D) 10 loops, best of 3: 85.1 ms per loop Einsum seems to be at least twice as fast for np.inner, np.outer, np.kron, and np.sum regardless of axes selection. The primary exception being np.dot as it calls DGEMM from a BLAS library. So why is np.einsum faster that other numpy functions that are equivalent? The DGEMM case for completeness: np.allclose(np.dot(arr_2D,arr_2D),np.einsum('ij,jk',arr_2D,arr_2D)) True %timeit np.einsum('ij,jk',arr_2D,arr_2D) 10 loops, best of 3: 56.1 ms per loop %timeit np.dot(arr_2D,arr_2D) 100 loops, best of 3: 5.17 ms per loop The leading theory is from @sebergs comment that np.einsum can make use of SSE2, but numpy's ufuncs will not until numpy 1.8 (see the change log). I believe this is the correct answer, but have not been able to confirm it. Some limited proof can be found by changing the dtype of input array and observing speed difference and the fact that not everyone observes the same trends in timings.

    Read the article

  • Shut down windows service based on load

    - by JP
    Hello, I was wondering if there are any free / open source solutions that will start and stop a windows service based on load? I have some pubsub subscriber services that do background work which is not critical. Ideally i would like tot be able to automate things so that these services could start if memory/cpu/disk i/o was under a certain threshold and stop gracefully if that threshold was met. Do you know of any solutions? Thanks JP

    Read the article

  • Unrecognized input filetype FFMPEG gas-preprocessor.pl

    - by Eyal
    Hi, I try to use FFMPEG in the iPhone, I follow by this link http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2009-October/076618.html When I running the ./configure --cc=/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/arm-apple-darwin9-gcc-4.2.1 --as='gas-preprocessor.pl /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/arm-apple-darwin9-gcc-4.2.1' --sysroot=/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS3.1.sdk --enable-cross-compile --target-os=darwin --arch=arm --cpu=arm1176jzf-s script I getting the error: “Unrecognized input filetype at /bin/sh line 23? Pls help me. Thanks, Eyal.

    Read the article

  • How do I profile in DrScheme?

    - by kunjaan
    How Do I profile my functions using DrScheme? (require profile) (define (factorial n) (cond ((= n 1) 1) (else (* n (factorial (- n 1)))))) (profile factorial) The above code returns Profiling results ----------------- Total cpu time observed: 0ms (out of 0ms) Number of samples taken: 0 (once every 0ms) ==================================== Caller Idx Total Self Name+srcLocal% ms(pct) ms(pct) Callee ==================================== > I tried: - (profile (factorial 100)) - (profile factorial) (factorial 100) But it gives me the same result. What am I doing wrong?

    Read the article

  • atomic operation cost

    - by osgx
    Hello What is the cost of the atomic operation? How much cycles does it consume? Will it pause other processors on SMP or NUMA, or will it block memory accesses? Will it flush reorder buffer in out-of-order CPU? What effects will be on the cache? Thanks.

    Read the article

  • T-Mobile G1 (MSM7200) GPU Memory

    - by Reflog
    Hello. I'm trying to find some information regarding the available GPU (for OpenGL) memory on the T-Mobile G1. This phone has a MSM7200 Qualcomm chip inside with ATI Imageon GPU. Unfortunately I am not able to dig any info regarding the specifics of GPU memory usage. How much memory is available in total for the textures? Is the memory shared with the CPU memory? Thanks in advance, Eli

    Read the article

  • Profiling python C extensions

    - by pygabriel
    I have developed a python C-extension that receives data from python and compute some cpu intensive calculations. It's possible to profile the C-extension? The problem here is that writing a sample test in C to be profiled would be challenging because the code rely on particular inputs and data structures (generated by python control code). Do you have any suggestions?

    Read the article

  • barriers in SMP linux kernel

    - by osgx
    Hello Is there smth like pthread_barrier in SMP Linux kernel? When kernel works simultaneously on 2 and more CPUs with the same structure, the barrier (like pthread_barrier) can be useful. It will stop all CPUs entering to it until last CPU will run the barrier. From this moment all CPUs again works.

    Read the article

  • Parallel computing for integrals

    - by Iman
    I want to reduce the calculation time for a time-consuming integral by splitting the integration range. I'm using C++, Windows, and a quad-core Intel i7 CPU. How can I split it into 4 parallel computations?

    Read the article

  • Best free flowchart software?

    - by Click Upvote
    I need to map out a complex algorithm with lots of conditional options. Need an easy to use flowchart software, preferably free since I need it for just a one time use. Would prefer something lightweight which doesn't eat up all the CPU memory. Any ideas?

    Read the article

  • jQuery mousemove performance

    - by Colby77
    Hi, When I bind a mousemove event to an element it is working smoothly with every browser except Internet Explorer. With IE the CPU usage is way too much and some associated things (eg. tooltip) are ugly. Is there any way I could rid of the performance problem? (yeah I know, don't use IE :))

    Read the article

  • An affordable way to use multiple Delayed::Job queues

    - by NudeCanalTroll
    I have a Ruby on Rails app that needs process many background jobs simultaneously: anywhere from 5-6 at a time to up to 50-60 at a time depending on the time of day. Right now my app is running on Heroku, which charges $.05/hour per worker, regardless of how much CPU or memory the worker is using. This is costing me a boatload each month... up to $1200/mo. Are there any hosts that will allow me to do what I'm doing for significantly cheaper?

    Read the article

  • Powermock Slows Down Test Startup on Eclipse/Fedora 10 when on NTFS partition

    - by MrWiggles
    I've just started having a proper play with Powermock and noticed that it slows down test startup immensely. A quick look at top while it was running shows that mount.nfts-3g was taking up most of the CPU. I moved Eclipse and my source directory to ext3 partitions to see if that was a problem and the tests now startup quicker but there's still a noticeable delay. Is this normal with Powermock or am I missing something obvious?

    Read the article

  • Understanding VS2010 C# parallel profiling results

    - by Haggai
    I have a program with many independent computations so I decided to parallelize it. I use Parallel.For/Each. The results were okay for a dual-core machine - CPU utilization of about 80%-90% most of the time. However, with a dual Xeon machine (i.e. 8 cores) I get only about 30%-40% CPU utilization, although the program spends quite a lot of time (sometimes more than 10 seconds) on the parallel sections, and I see it employs about 20-30 more threads in those sections compared to serial sections. Each thread takes more than 1 second to complete, so I see no reason for them to work in parallel - unless there is a synchronization problem. I used the built-in profiler of VS2010, and the results are strange. Even though I use locks only in one place, the profiler reports that about 85% of the program's time is spent on synchronization (also 5-7% sleep, 5-7% execution, under 1% IO). The locked code is only a cache (a dictionary) get/add: bool esn_found; lock (lock_load_esn) esn_found = cache.TryGetValue(st, out esn); if(!esn_found) { esn = pData.esa_inv_idx.esa[term_idx]; esn.populate(pData.esa_inv_idx.datafile); lock (lock_load_esn) { if (!cache.ContainsKey(st)) cache.Add(st, esn); } } lock_load_esn is a static member of the class of type Object. esn.populate reads from a file using a separate StreamReader for each thread. However, when I press the Synchronization button to see what causes the most delay, I see that the profiler reports lines which are function entrance lines, and doesn't report the locked sections themselves. It doesn't even report the function that contains the above code (reminder - the only lock in the program) as part of the blocking profile with noise level 2%. With noise level at 0% it reports all the functions of the program, which I don't understand why they count as blocking synchronizations. So my question is - what is going on here? How can it be that 85% of the time is spent on synchronization? How do I find out what really is the problem with the parallel sections of my program? Thanks.

    Read the article

  • How to catch GMail auto-refresh

    - by nameanyone
    I wrote a userscript to highlight the current row in GMail (indicated by the arrow). Unfortunately the highlight will only stay until GMail Inbox is auto-refreshed, which happens quite often. Is there a way to catch that event so I could reapply the highlighting? I don't want to do it on timeout. There is another userscript that does that and it loads up CPU.

    Read the article

  • Is DxScene the "WPF for Delphi"? Anyone used it?

    - by André Mussche
    I am playing with DxScene and VxScene: http://www.ksdev.com/dxscene/index.html It looks very nice and powerful: 3d accelerated vector graphics, cross plaform, nice effects, many 2d GUI controls (vector based), good scaling, transparency, rotating (x, y, z), 3d models, etc. Even with many effects, the CPU stays very low (0%)! http://www.ksdev.com/dxscene/snapshot/screen0.jpeg But can it be seen as a good WPF alternative for Delphi? And does anyone use it instead of normal Delphi VCL?

    Read the article

  • How can I get page faults statistics from kernel

    - by osgx
    Hello How can I get page faults statistics from kernel for my application while it is running? What about other events, like inter-cpu migrations count in SMP nodes, or number of context switches? I want to count such events for various small parts of the program. Thanks.

    Read the article

  • Creating Outlook 2010 Add in for 64bit

    - by Grant
    Hi, does anyone know if there is a guide to creating an outlook add in for office 2010 that runs in 64bit mode? I have an add in that DOES work on in Outlook 2010 32bit but it doesn't appear in 64bit - in the add in section its set to disabled. I have tried to compile under different target CPU's but that hasn't helped..

    Read the article

< Previous Page | 116 117 118 119 120 121 122 123 124 125 126 127  | Next Page >