Search Results

Search found 5954 results on 239 pages for 'cpu cores'.

Page 89/239 | < Previous Page | 85 86 87 88 89 90 91 92 93 94 95 96 | Next Page >

Are all the system's floating points operations the same?

- by Jj

We're making this web app in PHP and when working in the reports we have Excel files to compare our results to make sure our coding is doing the right operations. Now we're running into some differences due floating point arithmetics. We're doing the same divisions and multiplications and running into slightly different numbers, that add up to a notable difference. My question is if Excel is delegating it's floating point arithmetic to the CPU and PHP is also relying in the CPU for it's operations. Or does each application implements its own set of math algorithms?

Read the article
How many layers are between my program and the hardware?

- by sub

I somehow have the feeling that modern systems, including runtime libraries, this exception handler and that built-in debugger build up more and more layers between my (C++) programs and the CPU/rest of the hardware. I'm thinking of something like this: 1 + 2 OS top layer Runtime library/helper/error handler a hell lot of DLL modules OS kernel layer Do you really want to run 1 + 2?-Windows popup (don't take this serious) OS kernel layer Hardware abstraction Hardware Go through at least 100 miles of circuits Eventually arrive at the CPU ADD 1, 2 Go all the way back to my program Nearly all technical things are simply wrong and in some random order, but you get my point right? How much longer/shorter is this chain when I run a C++ program that calculates 1 + 2 at runtime on Windows? How about when I do this in an interpreter? (Python|Ruby|PHP) Is this chain really as dramatic in reality? Does Windows really try "not to stand in the way"? e.g.: Direct connection my binary < hardware?

Read the article
Improving the efficiency of Kinect for Windows DTWGestureRecognition Application

- by Ray

Currently I am using the DTWGestureRecognition open source tool for Kinect SDK v1.5. I have recorded a few gestures and use them to navigate through Windows 7. I also have implemented voice control for simple things such as opening PowerPoint, Chrome, etc. My main issue is that the application uses quite a bit of my CPU power which causes it to become slow. During gestures and voice commands, the CPU usage sometimes spikes to 80-90%, which causes the application to be unresponsive for a few seconds. I am running it on a 64 bit Windows 7 machine with an i5 processor and 8 GB of RAM. I was wondering if anyone with any experience using this tool or Kinect in general has made it more efficient and less performance hogging. Right now I removed sections which display the RGB video and the Depth video but even doing that did not make a big impact. Any help is appreciated, thanks!

Read the article
Continuously checking database from a Windows service

- by JonF

I am making a Windows service which needs to continuously check for database entries that can be added at any time to tell it to execute some code. It is looking to see if it's status is set to pending, and it's execute time entry is than the current time. Is the only way to do this to just run select statements over and over? It might need to execute the code every minute which means I need to run the select statement every minute looking for entries in the database. I'm trying to avoid unneccesary cpu time because I'm probably going to end up paying for cpu cycles on the hosting provider

Read the article
ffmpeg libavcodec.so missing while compile with cygwin

- by nick

I am building ffmpeg for android by following this tutorial now i got android folder inside the ffmpeg2.0.1 folder but there is no libavcodec-55.so file. instead of that i have lib/libavcodec.a how can i get libavcodec.so file? build_android.sh #!/bin/bash NDK=$HOME/Desktop/adt/android-ndk-r9 SYSROOT=$NDK/platforms/android-9/arch-arm/ TOOLCHAIN=$NDK/toolchains/arm-linux-androideabi-4.8/prebuilt/windows-x86_64 function build_one { ./configure \ --prefix=$PREFIX \ --enable-shared \ --disable-static \ --disable-doc \ --disable-ffmpeg \ --disable-ffplay \ --disable-ffprobe \ --disable-ffserver \ --disable-avdevice \ --disable-doc \ --disable-symver \ --cross-prefix=$TOOLCHAIN/bin/arm-linux-androideabi- \ --target-os=linux \ --arch=arm \ --enable-cross-compile \ --sysroot=$SYSROOT \ --extra-cflags="-Os -fpic $ADDI_CFLAGS" \ --extra-ldflags="$ADDI_LDFLAGS" \ $ADDITIONAL_CONFIGURE_FLAG make clean make make install } CPU=arm PREFIX=$(pwd)/android/$CPU ADDI_CFLAGS="-marm" build_one

Read the article
measuring performance - using real clicks vs "ab" command

- by shanyu

I have a web site in closed beta, developed in Django, runs with Mysql on Debian. In the last few days, the main page has been showing a slowdown. For every ten clicks, one or two receives extremely slow response (10 secs or more), others are as fast as they used to be. When I was searching for the problem, I ran into this issue that I couldn't grasp: top command shows that when I request the main page, mysql shoots up to 90% - 100% cpu usage. I get the page just as the cpu use gets back to normal. So, I thought, it is db. Then I called ab with parameters -n 1000 -c 5, I got decent performance, about 100 pages per second, just as it was before the slowdown. I would imagine a worse performance as 10-20% of requests take 10 secs to load. Is this conflict between ab and "real" clicks normal, or am I using ab in a wrong configuration?

Read the article
Cannot run 32bit compiled WPF applications on Windows 7 64bit

- by adriaanp

I created a WPF project in VS2008 and compiled it with Any CPU, x64 and x86. Any CPU and x64 works, but compiling to x86 the application is hanging when running through VS2008 and crashing when running without debugging. Debugging it with WinDbg I can see a StackOverflowException and sometimes a MissingMethodException relating to WPF methods. Common sense is telling something here that the CLR is not loading the correct assemblies or something when running 32bit WPF apps. I tried reinstalling .NET Framework 3.5 SP1, but it does not fix the problem. I don't know how to go about checking if the correct assemblies are loaded or used. Any ideas? UPDATE: Not a real solution but the best I could do quickly was to reinstall Windows 7

Read the article
Neo4j 1.9.4 (REST Server,CYPHER) performance issue

- by user2968943

I have Neo4j 1.9.4 installed on 24 core 24Gb ram (centos) machine and for most queries CPU usage spikes goes to 200% with only few concurrent requests. Domain: some sort of social application where few types of nodes(profiles) with 3-30 text/array properties and 36 relationship types with at least 3 properties. Most of nodes currently has ~300-500 relationships. Current data set footprint(from console): LogicalLogSize=4294907 (32MB) ArrayStoreSize=1675520 (12MB) NodeStoreSize=1342170 (10MB) PropertyStoreSize=1739548 (13MB) RelationshipStoreSize=6395202 (48MB) StringStoreSize=1478400 (11MB) which is IMHO really small. most queries looks like this one(with more or less WITH .. MATCH .. statements and few queries with variable length relations but the often fast): START targetUser=node({id}), currentUser=node({current}) MATCH targetUser-[contact:InContactsRelation]->n, n-[:InLocationRelation]->l, n-[:InCategoryRelation]->c WITH currentUser, targetUser,n, l,c, contact.fav is not null as inFavorites MATCH n<-[followers?:InContactsRelation]-() WITH currentUser, targetUser,n, l,c,inFavorites, COUNT(followers) as numFollowers RETURN id(n) as id, n.name? as name, n.title? as title, n._class as _class, n.avatar? as avatar, n.avatar_type? as avatar_type, l.name as location__name, c.name as category__name, true as isInContacts, inFavorites as isInFavorites, numFollowers it runs in ~1s-3s(for first run) and ~1s-70ms (for consecutive and it depends on query) and there is about 5-10 queries runs for each impression. Another interesting behavior is when i try run query from console(neo4j) on my local machine many consecutive times(just press ctrl+enter for few seconds) it has almost constant execution time but when i do it on server it goes slower exponentially and i guess it somehow related with my problem. Problem: So my problem is that neo4j is very CPU greedy(for 24 core machine its may be not an issue but its obviously overkill for small project). First time i used AWS EC2 m1.large instance but over all performance was bad, during testing, CPU always was over 100%. Some relevant parts of configuration: neostore.nodestore.db.mapped_memory=1280M wrapper.java.maxmemory=8192 note: I already tried configuration where all memory related parameters where HIGH and it didn't worked(no change at all). Question: Where to digg? configuration? scheme? queries? what i'm doing wrong? if need more info(logs, configs) just ask ;)

Read the article
segmented reduction with scattered segments

- by Christian Rau

I got to solve a pretty standard problem on the GPU, but I'm quite new to practical GPGPU, so I'm looking for ideas to approach this problem. I have many points in 3-space which are assigned to a very small number of groups (each point belongs to one group), specifically 15 in this case (doesn't ever change). Now I want to compute the mean and covariance matrix of all the groups. So on the CPU it's roughly the same as: for each point p { mean[p.group] += p.pos; covariance[p.group] += p.pos * p.pos; ++count[p.group]; } for each group g { mean[g] /= count[g]; covariance[g] = covariance[g]/count[g] - mean[g]*mean[g]; } Since the number of groups is extremely small, the last step can be done on the CPU (I need those values on the CPU, anyway). The first step is actually just a segmented reduction, but with the segments scattered around. So the first idea I came up with, was to first sort the points by their groups. I thought about a simple bucket sort using atomic_inc to compute bucket sizes and per-point relocation indices (got a better idea for sorting?, atomics may not be the best idea). After that they're sorted by groups and I could possibly come up with an adaption of the segmented scan algorithms presented here. But in this special case, I got a very large amount of data per point (9-10 floats, maybe even doubles if the need arises), so the standard algorithms using a shared memory element per thread and a thread per point might make problems regarding per-multiprocessor resources as shared memory or registers (Ok, much more on compute capability 1.x than 2.x, but still). Due to the very small and constant number of groups I thought there might be better approaches. Maybe there are already existing ideas suited for these specific properties of such a standard problem. Or maybe my general approach isn't that bad and you got ideas for improving the individual steps, like a good sorting algorithm suited for a very small number of keys or some segmented reduction algorithm minimizing shared memory/register usage. I'm looking for general approaches and don't want to use external libraries. FWIW I'm using OpenCL, but it shouldn't really matter as the general concepts of GPU computing don't really differ over the major frameworks.

Read the article
Which Ipod touch generation should I buy? 2nd or 3rd?

- by kukabunga

I want to create games for Iphone/Ipod touch. Unfortunately I don't have a lot of money so I can buy only one device. Ipod is cheaper than Iphone, so I decided to bought Ipod touch. But I am afraid of buying 3rd generation - because it has more memory, more faster CPU, etc. And I think if I post my app on appstore - people with 2nd generation Ipod might have trouble with my app (because I was testing it on 3rd generation). But on the other hand - I am planning to create 3d/cpu demanding game - and it would be easy for me to implement it on device with more calculation power... What should I do in this situation? Any advice is appreciate.

Read the article
Cassandra performance slow down with counter column

- by tubcvt

I have a cluster (4 node ) and a node have 16 core and 24 gb ram: 192.168.23.114 datacenter1 rack1 Up Normal 44.48 GB 25.00% 192.168.23.115 datacenter1 rack1 Up Normal 44.51 GB 25.00% 192.168.23.116 datacenter1 rack1 Up Normal 44.51 GB 25.00% 192.168.23.117 datacenter1 rack1 Up Normal 44.51 GB 25.00% We use about 10 column family (counter column) to make some system statistic report. Problem on here is that When i set replication_factor of this keyspace from 1 to 2 (contain 10 counter column family ), all cpu of node increase from 10% ( when use replication factor=1) to --- 90%. :( :( who can help me work around that :( . why counter column consume too much cpu time :(. thanks all

Read the article
When compiling programs to run inside a VM, what should march and mtune be set to?

- by Russ

With VMs being slave to whatever the host machine is providing, what compiler flags should be provided to gcc? I would normally think that -march=native would be what you would use when compiling for a dedicated box, but the fine detail that -march=native is going to as indicated in this article makes me extremely wary of using it. So... what to set -march and -mtune to inside a VM? For a specific example... My specific case right now is compiling python (and more) in a linux guest inside a KVM-based "cloud" host that I have no real control over the host hardware (aside from 'simple' stuff like CPU GHz m CPU count, and available RAM). Currently, cpuinfo tells me I've got an "AMD Opteron(tm) Processor 6176" but I honestly don't know (yet) if that is reliable and whether the guest can get moved around to different architectures on me to meet the host's infrastructure shuffling needs (sounds hairy/unlikely). All I can really guarantee is my OS, which is a 64-bit linux kernel where uname -m yields x86_64.

Read the article
Direct video card access

- by icemanind

Guys, I am trying to write a class in C# that can be used as a direct replacement for the C# Bitmap class. What I want to do instead though is perform all graphic functions done on the bitmap using the power of the video card. From what I understand, functions such as DrawLine or DrawArc or DrawText are primitive functions that use simple cpu math algorithms to perform the job. I, instead, want to use the graphics card cpu and memory to do these and other advanced functions, such as skinning a bitmap (applying a texture) and true transparancy. My problem is, in C#, how do I access direct video functions? Is there a library or something I need?

Read the article
CUDA small kernel 2d convolution - how to do it

- by paulAl

I've been experimenting with CUDA kernels for days to perform a fast 2D convolution between a 500x500 image (but I could also vary the dimensions) and a very small 2D kernel (a laplacian 2d kernel, so it's a 3x3 kernel.. too small to take a huge advantage with all the cuda threads). I created a CPU classic implementation (two for loops, as easy as you would think) and then I started creating CUDA kernels. After a few disappointing attempts to perform a faster convolution I ended up with this code: http://www.evl.uic.edu/sjames/cs525/final.html (see the Shared Memory section), it basically lets a 16x16 threads block load all the convolution data he needs in the shared memory and then performs the convolution. Nothing, the CPU is still a lot faster. I didn't try the FFT approach because the CUDA SDK states that it is efficient with large kernel sizes. Whether or not you read everything I wrote, my question is: how can I perform a fast 2D convolution between a relatively large image and a very small kernel (3x3) with CUDA?

Read the article
Can this loop be sped up in pure Python?

- by Noctis Skytower

I was trying out an experiment with Python, trying to find out how many times it could add one to an integer in one minute's time. Assuming two computers are the same except for the speed of the CPUs, this should give an estimate of how fast some CPU operations may take for the computer in question. The code below is an example of a test designed to fulfill the requirements given above. This version is about 20% faster than the first attempt and 150% faster than the third attempt. Can anyone make any suggestions as to how to get the most additions in a minute's time span? Higher numbers are desireable. EDIT: This experiment is being written in Python 3.1 and is 15% faster than the fourth speed-up attempt. def start(seconds): import time, _thread def stop(seconds, signal): time.sleep(seconds) signal.pop() total, signal = 0, [None] _thread.start_new_thread(stop, (seconds, signal)) while signal: total += 1 return total if __name__ == '__main__': print('Testing the CPU speed ...') print('Relative speed:', start(60))

Read the article
Given a function which produces a random integer in the range 1 to 5, write a function which produce

- by praveen

what is simple solution what is effective solution to less minimum memory and(or) cpu speed?

Read the article
PC reboots spontaenously: debugging tips [closed]

- by aaron

I swapped my core 2 duo for a quad core recently, and generally things run fine, but every now and then my computer just restarts. I don't even get a blue screen (Vista 32). Core temp isn't a problem. My thinking is that my power supply is inadequate, but I haven't been able to test that (one idea was to under clock the cpu to see if that helped, but going up in speed was the only simple thing to do in the BIOS) Two cases where I semi-consistanly get problems: - Borderlands windowed after some period of time (and some other games, but Borderlands does it pretty regularly) - watching a video (e.g. quicktime/vlc) and having another video running Another thought is non-cpu heat? Maybe the graphics card? Any thoughts on how to track this down appreciated. Thanks!

Read the article
c++ thread running time

- by chnet

I want to know whether I can calculate the running time for each thread. I implement a multithread program in C++ using pthread. As we know, each thread will compete the CPU. Can I use clock() function to calculate the actual number of CPU clocks each thread consumes? my program looks like: Class Thread () { Start(); Run(); Computing(); }; Start() is to start multiple threads. Then each thread will run Computing function to do something. My question is how I can calculate the running time of each thread for Computing function

Read the article
Filemaker XSL 20sec Query Latency

- by Ian Wetherbee

I have an ASP frontend that loads data from a Filemaker database using XSL to perform simple queries. The problem is that the first page load takes 20 seconds +/- 200ms, then the next few page refreshes within a minute of the first request take <200ms, then the cycle starts over again. Each page load makes only 2 XSL queries, and they execute fast after the first page load, so what is causing the delay on the first page load? I have caching turned up with a 100% hit rate, and number of connections at 100. I've tried with XSL database sessions on and off, and session time anywhere from 1 to 60 minutes without any changes. The XSL loads from ASP use a GET request and add a Basic Authorization header to authenticate each time. During fast page requests, the fmserver.exe and fmswpc.exe processes don't even flinch, but during a 20 second holdup I see fmserver jump to 30% CPU and a 3mb I/O read a few seconds into the request, and occasionally fmswpc jump to 60% CPU.

Read the article
x86 and Memory Addressing

- by IM

I've been reading up on memory models in an assembly book I picked up and I have a question or two. Let's say that the address bus has 32 lines, the data bus has 32 lines and the CPU is 32-bit (for simplicity). Now if the CPU makes a read request and sends the 32bit address, but only needs 8 bits, all 32 bits come back anyway? Also, the addresses in memory are still addressed per byte correct? So fetching one byte would bring back 0000 0001 to address 0000 0004? Thanks in advance

Read the article
Why are there performance differences when a SQL function is called from .Net app vs when the same c

- by Dan Snell

We are having a problem in our test and dev environments with a function that runs quite slowly at times when called from an .Net Application. When we call this function directly from management studio it works fine. Here are the differences when they are profiled: From the Application: CPU: 906 Reads: 61853 Writes: 0 Duration: 926 From SSMS: CPU: 15 Reads: 11243 Writes: 0 Duration: 31 Now we have determined that when we recompile the function the performance returns to what we are expecting and the performance profile when run from the application matches that of what we get when we run it from SSMS. It will start slowing down again at what appear to random intervals. We have not seen this in prod but they may be in part because everything is recompiled there on a weekly basis. So what might cause this sort of behavior?

Read the article
Parallelism on two duo-core processor system

- by Qin

I wrote a Java program that draw the Mandelbrot image. To make it interesting, I divided the for loop that calculates the color of each pixel into 2 halves; each half will be executed as a thread thus parallelizing the task. On a two core one cpu system, the performance of using two thread approach vs just one main thread is nearly two fold. My question is on a two dual-core processor system, will the parallelized task be split among different processor instead of just utilize the two core on one processor? I suppose the former scenario will be slower than the latter one simply because the latency of communicating between 2 CPU over the motherboard wires. Any ideas? Thanks

Read the article
perf events documentation

- by Thanatos

I've searched for an exhaustive explanation of the meaning of each event monitored by the perf stat command; I've found a tutorial which explains quite well how to use different the features of the perf tool. However, it doesn't explain the meaning of several events that can be observed (and there are a lot!!). Someone know where is a quite simple and complete documentation about the events listed by the perf list command? In particular, I'm interested in finding out the percentage of cpu used by some application I wrote. Can i measure it directly through cpu-clock or task-clock? What's the meaning of these two events? Thanks in advance

Read the article
pthreads recursively calling system command and segfault appears

- by jess

I have a code base where i am creating 8 threads and each thread just calls system command to display date in a continuous cycle, as shown below: void * system_thread(void *arg) { int cpu = (int)arg; printf("thread : start %d\n", cpu); for (;;) { // date ã³ãã³ãã®å®è¡ if (mode == 0) { system("date"); } else { f_hfp_nlc_Fsystem("date"); } } sleep(timerval); return NULL; } This application segfaults after running for 2-3 seconds, due to following 2 reasons: 1. read access, where the address is out of VM area 2. write acces, where it does not of write permission and its trying to modify some structure.

Read the article
How to profiling my C++ application on linux

- by richard

HI, I would like to profile my c++ application on linux. I would like to find out how much time my application spent on CPU processing vs time spent on block by IO/being idle. I know there is a profile tool call valgrind on linux. But it breaks down time spent on each method, and it does not give me an overall picture of how much time spent on CPU processing vs idle? Or is there a way to do that with valgrind. Thank you.

Read the article

< Previous Page | 85 86 87 88 89 90 91 92 93 94 95 96 | Next Page >