cpu - Page 82 - Developer IT

Does set affinity ensure that only one core resources are used?

- by Kailash

Hello, I just wanted to find out if setting cpu affinity ensure that the application runs only on that core ?

segmented reduction with scattered segments

I got to solve a pretty standard problem on the GPU, but I'm quite new to practical GPGPU, so I'm looking for ideas to approach this problem. I have many points in 3-space which are assigned to a very small number of groups (each point belongs to one group), specifically 15 in this case (doesn't ever change). Now I want to compute the mean and covariance matrix of all the groups. So on the CPU it's roughly the same as: for each point p { mean[p.group] += p.pos; covariance[p.group] += p.pos * p.pos; ++count[p.group]; } for each group g { mean[g] /= count[g]; covariance[g] = covariance[g]/count[g] - mean[g]*mean[g]; } Since the number of groups is extremely small, the last step can be done on the CPU (I need those values on the CPU, anyway). The first step is actually just a segmented reduction, but with the segments scattered around. So the first idea I came up with, was to first sort the points by their groups. I thought about a simple bucket sort using atomic_inc to compute bucket sizes and per-point relocation indices (got a better idea for sorting?, atomics may not be the best idea). After that they're sorted by groups and I could possibly come up with an adaption of the segmented scan algorithms presented here. But in this special case, I got a very large amount of data per point (9-10 floats, maybe even doubles if the need arises), so the standard algorithms using a shared memory element per thread and a thread per point might make problems regarding per-multiprocessor resources as shared memory or registers (Ok, much more on compute capability 1.x than 2.x, but still). Due to the very small and constant number of groups I thought there might be better approaches. Maybe there are already existing ideas suited for these specific properties of such a standard problem. Or maybe my general approach isn't that bad and you got ideas for improving the individual steps, like a good sorting algorithm suited for a very small number of keys or some segmented reduction algorithm minimizing shared memory/register usage. I'm looking for general approaches and don't want to use external libraries. FWIW I'm using OpenCL, but it shouldn't really matter as the general concepts of GPU computing don't really differ over the major frameworks.

Read the article

Neo4j 1.9.4 (REST Server,CYPHER) performance issue

- by user2968943

I have Neo4j 1.9.4 installed on 24 core 24Gb ram (centos) machine and for most queries CPU usage spikes goes to 200% with only few concurrent requests. Domain: some sort of social application where few types of nodes(profiles) with 3-30 text/array properties and 36 relationship types with at least 3 properties. Most of nodes currently has ~300-500 relationships. Current data set footprint(from console): LogicalLogSize=4294907 (32MB) ArrayStoreSize=1675520 (12MB) NodeStoreSize=1342170 (10MB) PropertyStoreSize=1739548 (13MB) RelationshipStoreSize=6395202 (48MB) StringStoreSize=1478400 (11MB) which is IMHO really small. most queries looks like this one(with more or less WITH .. MATCH .. statements and few queries with variable length relations but the often fast): START targetUser=node({id}), currentUser=node({current}) MATCH targetUser-[contact:InContactsRelation]->n, n-[:InLocationRelation]->l, n-[:InCategoryRelation]->c WITH currentUser, targetUser,n, l,c, contact.fav is not null as inFavorites MATCH n<-[followers?:InContactsRelation]-() WITH currentUser, targetUser,n, l,c,inFavorites, COUNT(followers) as numFollowers RETURN id(n) as id, n.name? as name, n.title? as title, n._class as _class, n.avatar? as avatar, n.avatar_type? as avatar_type, l.name as location__name, c.name as category__name, true as isInContacts, inFavorites as isInFavorites, numFollowers it runs in ~1s-3s(for first run) and ~1s-70ms (for consecutive and it depends on query) and there is about 5-10 queries runs for each impression. Another interesting behavior is when i try run query from console(neo4j) on my local machine many consecutive times(just press ctrl+enter for few seconds) it has almost constant execution time but when i do it on server it goes slower exponentially and i guess it somehow related with my problem. Problem: So my problem is that neo4j is very CPU greedy(for 24 core machine its may be not an issue but its obviously overkill for small project). First time i used AWS EC2 m1.large instance but over all performance was bad, during testing, CPU always was over 100%. Some relevant parts of configuration: neostore.nodestore.db.mapped_memory=1280M wrapper.java.maxmemory=8192 note: I already tried configuration where all memory related parameters where HIGH and it didn't worked(no change at all). Question: Where to digg? configuration? scheme? queries? what i'm doing wrong? if need more info(logs, configs) just ask ;)

Read the article

CUDA small kernel 2d convolution - how to do it

- by paulAl

I've been experimenting with CUDA kernels for days to perform a fast 2D convolution between a 500x500 image (but I could also vary the dimensions) and a very small 2D kernel (a laplacian 2d kernel, so it's a 3x3 kernel.. too small to take a huge advantage with all the cuda threads). I created a CPU classic implementation (two for loops, as easy as you would think) and then I started creating CUDA kernels. After a few disappointing attempts to perform a faster convolution I ended up with this code: http://www.evl.uic.edu/sjames/cs525/final.html (see the Shared Memory section), it basically lets a 16x16 threads block load all the convolution data he needs in the shared memory and then performs the convolution. Nothing, the CPU is still a lot faster. I didn't try the FFT approach because the CUDA SDK states that it is efficient with large kernel sizes. Whether or not you read everything I wrote, my question is: how can I perform a fast 2D convolution between a relatively large image and a very small kernel (3x3) with CUDA?

Read the article

When compiling programs to run inside a VM, what should march and mtune be set to?

- by Russ

With VMs being slave to whatever the host machine is providing, what compiler flags should be provided to gcc? I would normally think that -march=native would be what you would use when compiling for a dedicated box, but the fine detail that -march=native is going to as indicated in this article makes me extremely wary of using it. So... what to set -march and -mtune to inside a VM? For a specific example... My specific case right now is compiling python (and more) in a linux guest inside a KVM-based "cloud" host that I have no real control over the host hardware (aside from 'simple' stuff like CPU GHz m CPU count, and available RAM). Currently, cpuinfo tells me I've got an "AMD Opteron(tm) Processor 6176" but I honestly don't know (yet) if that is reliable and whether the guest can get moved around to different architectures on me to meet the host's infrastructure shuffling needs (sounds hairy/unlikely). All I can really guarantee is my OS, which is a 64-bit linux kernel where uname -m yields x86_64.

Read the article

Cassandra performance slow down with counter column

- by tubcvt

I have a cluster (4 node ) and a node have 16 core and 24 gb ram: 192.168.23.114 datacenter1 rack1 Up Normal 44.48 GB 25.00% 192.168.23.115 datacenter1 rack1 Up Normal 44.51 GB 25.00% 192.168.23.116 datacenter1 rack1 Up Normal 44.51 GB 25.00% 192.168.23.117 datacenter1 rack1 Up Normal 44.51 GB 25.00% We use about 10 column family (counter column) to make some system statistic report. Problem on here is that When i set replication_factor of this keyspace from 1 to 2 (contain 10 counter column family ), all cpu of node increase from 10% ( when use replication factor=1) to --- 90%. :( :( who can help me work around that :( . why counter column consume too much cpu time :(. thanks all

Read the article

Cannot run 32bit compiled WPF applications on Windows 7 64bit

- by adriaanp

I created a WPF project in VS2008 and compiled it with Any CPU, x64 and x86. Any CPU and x64 works, but compiling to x86 the application is hanging when running through VS2008 and crashing when running without debugging. Debugging it with WinDbg I can see a StackOverflowException and sometimes a MissingMethodException relating to WPF methods. Common sense is telling something here that the CLR is not loading the correct assemblies or something when running 32bit WPF apps. I tried reinstalling .NET Framework 3.5 SP1, but it does not fix the problem. I don't know how to go about checking if the correct assemblies are loaded or used. Any ideas? UPDATE: Not a real solution but the best I could do quickly was to reinstall Windows 7

Read the article

Can this loop be sped up in pure Python?

- by Noctis Skytower

I was trying out an experiment with Python, trying to find out how many times it could add one to an integer in one minute's time. Assuming two computers are the same except for the speed of the CPUs, this should give an estimate of how fast some CPU operations may take for the computer in question. The code below is an example of a test designed to fulfill the requirements given above. This version is about 20% faster than the first attempt and 150% faster than the third attempt. Can anyone make any suggestions as to how to get the most additions in a minute's time span? Higher numbers are desireable. EDIT: This experiment is being written in Python 3.1 and is 15% faster than the fourth speed-up attempt. def start(seconds): import time, _thread def stop(seconds, signal): time.sleep(seconds) signal.pop() total, signal = 0, [None] _thread.start_new_thread(stop, (seconds, signal)) while signal: total += 1 return total if __name__ == '__main__': print('Testing the CPU speed ...') print('Relative speed:', start(60))

Read the article

Which Ipod touch generation should I buy? 2nd or 3rd?

- by kukabunga

I want to create games for Iphone/Ipod touch. Unfortunately I don't have a lot of money so I can buy only one device. Ipod is cheaper than Iphone, so I decided to bought Ipod touch. But I am afraid of buying 3rd generation - because it has more memory, more faster CPU, etc. And I think if I post my app on appstore - people with 2nd generation Ipod might have trouble with my app (because I was testing it on 3rd generation). But on the other hand - I am planning to create 3d/cpu demanding game - and it would be easy for me to implement it on device with more calculation power... What should I do in this situation? Any advice is appreciate.

Read the article

Direct video card access

- by icemanind

Guys, I am trying to write a class in C# that can be used as a direct replacement for the C# Bitmap class. What I want to do instead though is perform all graphic functions done on the bitmap using the power of the video card. From what I understand, functions such as DrawLine or DrawArc or DrawText are primitive functions that use simple cpu math algorithms to perform the job. I, instead, want to use the graphics card cpu and memory to do these and other advanced functions, such as skinning a bitmap (applying a texture) and true transparancy. My problem is, in C#, how do I access direct video functions? Is there a library or something I need?

Read the article

Filemaker XSL 20sec Query Latency

- by Ian Wetherbee

I have an ASP frontend that loads data from a Filemaker database using XSL to perform simple queries. The problem is that the first page load takes 20 seconds +/- 200ms, then the next few page refreshes within a minute of the first request take <200ms, then the cycle starts over again. Each page load makes only 2 XSL queries, and they execute fast after the first page load, so what is causing the delay on the first page load? I have caching turned up with a 100% hit rate, and number of connections at 100. I've tried with XSL database sessions on and off, and session time anywhere from 1 to 60 minutes without any changes. The XSL loads from ASP use a GET request and add a Basic Authorization header to authenticate each time. During fast page requests, the fmserver.exe and fmswpc.exe processes don't even flinch, but during a 20 second holdup I see fmserver jump to 30% CPU and a 3mb I/O read a few seconds into the request, and occasionally fmswpc jump to 60% CPU.

Read the article

pthreads recursively calling system command and segfault appears

- by jess

I have a code base where i am creating 8 threads and each thread just calls system command to display date in a continuous cycle, as shown below: void * system_thread(void *arg) { int cpu = (int)arg; printf("thread : start %d\n", cpu); for (;;) { // date ã³ãã³ãã®å®è¡ if (mode == 0) { system("date"); } else { f_hfp_nlc_Fsystem("date"); } } sleep(timerval); return NULL; } This application segfaults after running for 2-3 seconds, due to following 2 reasons: 1. read access, where the address is out of VM area 2. write acces, where it does not of write permission and its trying to modify some structure.

Read the article

Given a function which produces a random integer in the range 1 to 5, write a function which produce

- by praveen

what is simple solution what is effective solution to less minimum memory and(or) cpu speed?

Read the article

Why are there performance differences when a SQL function is called from .Net app vs when the same c

- by Dan Snell

We are having a problem in our test and dev environments with a function that runs quite slowly at times when called from an .Net Application. When we call this function directly from management studio it works fine. Here are the differences when they are profiled: From the Application: CPU: 906 Reads: 61853 Writes: 0 Duration: 926 From SSMS: CPU: 15 Reads: 11243 Writes: 0 Duration: 31 Now we have determined that when we recompile the function the performance returns to what we are expecting and the performance profile when run from the application matches that of what we get when we run it from SSMS. It will start slowing down again at what appear to random intervals. We have not seen this in prod but they may be in part because everything is recompiled there on a weekly basis. So what might cause this sort of behavior?

Read the article

PC reboots spontaenously: debugging tips [closed]

- by aaron

I swapped my core 2 duo for a quad core recently, and generally things run fine, but every now and then my computer just restarts. I don't even get a blue screen (Vista 32). Core temp isn't a problem. My thinking is that my power supply is inadequate, but I haven't been able to test that (one idea was to under clock the cpu to see if that helped, but going up in speed was the only simple thing to do in the BIOS) Two cases where I semi-consistanly get problems: - Borderlands windowed after some period of time (and some other games, but Borderlands does it pretty regularly) - watching a video (e.g. quicktime/vlc) and having another video running Another thought is non-cpu heat? Maybe the graphics card? Any thoughts on how to track this down appreciated. Thanks!

Read the article

How to use all the cores in Windows 7?

- by Anon

I am not sure if this belongs to Stackoverflow or Superuser but I thought I would ask here. I have a console based application written in C which currently takes about an hour to terminate in Windows 7 64-bit OS. The task manager reports that the application is using only 25% of the available CPU. I would like to reduce the run time by increasing cpu usage. Is there any way to let the application use all four cores (the laptop has Core i5) instead of just one? I am assuming that task manager reports 25% because only one core is allocated to the program.

Read the article

c++ thread running time

- by chnet

I want to know whether I can calculate the running time for each thread. I implement a multithread program in C++ using pthread. As we know, each thread will compete the CPU. Can I use clock() function to calculate the actual number of CPU clocks each thread consumes? my program looks like: Class Thread () { Start(); Run(); Computing(); }; Start() is to start multiple threads. Then each thread will run Computing function to do something. My question is how I can calculate the running time of each thread for Computing function

Read the article

Parallelism on two duo-core processor system

- by Qin

I wrote a Java program that draw the Mandelbrot image. To make it interesting, I divided the for loop that calculates the color of each pixel into 2 halves; each half will be executed as a thread thus parallelizing the task. On a two core one cpu system, the performance of using two thread approach vs just one main thread is nearly two fold. My question is on a two dual-core processor system, will the parallelized task be split among different processor instead of just utilize the two core on one processor? I suppose the former scenario will be slower than the latter one simply because the latency of communicating between 2 CPU over the motherboard wires. Any ideas? Thanks

Read the article

scikit learn extratreeclassifier hanging

- by denson

I'm running the scikit learn on some rather large training datasets ~1,600,000,000 rows with ~500 features. The platform is Ubuntu server 14.04, the hardware has 100gb of ram and 20 CPU cores. The test datasets are about half as many rows. I set n_jobs = 10, and am forest_size = 3*number_of_features so about 1700 trees. If I reduce the number of features to about 350 it works fine but never completes the training phase with the full feature set of 500+. The process is still executing and using up about 20gb of ram but is using 0% of CPU. I have also successfully completed on datasets with ~400,000 rows but twice as many features which completes after only about 1 hour. I am being careful to delete any arrays/objects that are not in use. Does anyone have any ideas I might try?

Read the article

x86 and Memory Addressing

- by IM

I've been reading up on memory models in an assembly book I picked up and I have a question or two. Let's say that the address bus has 32 lines, the data bus has 32 lines and the CPU is 32-bit (for simplicity). Now if the CPU makes a read request and sends the 32bit address, but only needs 8 bits, all 32 bits come back anyway? Also, the addresses in memory are still addressed per byte correct? So fetching one byte would bring back 0000 0001 to address 0000 0004? Thanks in advance

Read the article

Fast, cross-platform timer?

- by dsimcha

I'm looking to improve the D garbage collector by adding some heuristics to avoid garbage collection runs that are unlikely to result in significant freeing. One heuristic I'd like to add is that GC should not be run more than once per X amount of time (maybe once per second or so). To do this I need a timer with the following properties: It must be able to grab the correct time with minimal overhead. Calling core.stdc.time takes an amount of time roughly equivalent to a small memory allocation, so it's not a good option. Ideally, should be cross-platform (both OS and CPU), for maintenance simplicity. Super high resolution isn't terribly important. If the times are accurate to maybe 1/4 of a second, that's good enough. Must work in a multithreaded/multi-CPU context. The x86 rdtsc instruction won't work.

Read the article

perf events documentation

- by Thanatos

I've searched for an exhaustive explanation of the meaning of each event monitored by the perf stat command; I've found a tutorial which explains quite well how to use different the features of the perf tool. However, it doesn't explain the meaning of several events that can be observed (and there are a lot!!). Someone know where is a quite simple and complete documentation about the events listed by the perf list command? In particular, I'm interested in finding out the percentage of cpu used by some application I wrote. Can i measure it directly through cpu-clock or task-clock? What's the meaning of these two events? Thanks in advance

Read the article

How to profiling my C++ application on linux

- by richard

HI, I would like to profile my c++ application on linux. I would like to find out how much time my application spent on CPU processing vs time spent on block by IO/being idle. I know there is a profile tool call valgrind on linux. But it breaks down time spent on each method, and it does not give me an overall picture of how much time spent on CPU processing vs idle? Or is there a way to do that with valgrind. Thank you.

Read the article

How to profile my C++ application on linux

- by richard

HI, I would like to profile my c++ application on linux. I would like to find out how much time my application spent on CPU processing vs time spent on block by IO/being idle. I know there is a profile tool call valgrind on linux. But it breaks down time spent on each method, and it does not give me an overall picture of how much time spent on CPU processing vs idle? Or is there a way to do that with valgrind. Thank you.

Read the article

Monit won't run

- by Yaniro

I have two identical EC2 instances (the second is a replica of the first), running Gentoo. The first instance has monit running which monitors a single process and some system resources and functions great. In the second instance, monit runs but quits right away. The configuration is similar on both instances so are the versions of monit. monit.log shows: [GMT Oct 3 08:36:41] info : monit daemon with PID 5 awakened Final lines on strace monit show: write(2, "monit daemon with PID 5 awakened"..., 33monit daemon with PID 5 awakened ) = 33 time(NULL) = 1349252827 open("/etc/localtime", O_RDONLY) = 4 fstat64(4, {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 fstat64(4, {st_mode=S_IFREG|0644, st_size=118, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb773a000 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 118 _llseek(4, -6, [112], SEEK_CUR) = 0 read(4, "\nGMT0\n", 4096) = 6 close(4) = 0 munmap(0xb773a000, 4096) = 0 write(3, "[GMT Oct 3 08:27:07] info :"..., 33) = 33 write(3, "monit daemon with PID 5 awakened"..., 33) = 33 waitpid(-1, NULL, WNOHANG) = -1 ECHILD (No child processes) close(3) = 0 exit_group(0) = ? No core dumps (ulimit -c shows unlimited) monit -v shows: monit: Debug: Adding host allow 'localhost' monit: Debug: Skipping redundant host 'localhost' monit: Debug: Skipping redundant host 'localhost' monit: Debug: Adding credentials for user 'xxxx'. Runtime constants: Control file = /etc/monitrc Log file = /var/log/monit/monit.log Pid file = /var/run/monit.pid Id file = /var/run/monit.pid Debug = True Log = True Use syslog = False Is Daemon = True Use process engine = True Poll time = 30 seconds with start delay 0 seconds Expect buffer = 256 bytes Event queue = base directory /var/monit with 100 slots Mail server(s) = xx.xxx.xx.xxx with timeout 30 seconds Mail from = (not defined) Mail subject = (not defined) Mail message = (not defined) Start monit httpd = True httpd bind address = Any/All httpd portnumber = 2812 httpd signature = True Use ssl encryption = False httpd auth. style = Basic Authentication and Host/Net allow list Alert mail to = [email protected] Alert on = All events The service list contains the following entries: System Name = xxxx Monitoring mode = active CPU wait limit = if greater than 20.0% 1 times within 1 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert CPU system limit = if greater than 30.0% 1 times within 1 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert CPU user limit = if greater than 70.0% 1 times within 1 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert Swap usage limit = if greater than 25.0% 1 times within 1 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert Memory usage limit = if greater than 75.0% 1 times within 1 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert Load avg. (5min) = if greater than 2.0 1 times within 1 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert Load avg. (1min) = if greater than 4.0 1 times within 1 cycle(s) then alert else if succeeded 1 times within 1 cycle(s) then alert Process Name = xxxx Group = server Pid file = /var/run/xxxx.pid Monitoring mode = active Start program = '/etc/init.d/xxxx restart' timeout 20 second(s) Stop program = '/etc/init.d/xxxx stop' timeout 30 second(s) Existence = if does not exist 1 times within 1 cycle(s) then restart else if succeeded 1 times within 1 cycle(s) then alert Pid = if changed 1 times within 1 cycle(s) then alert Ppid = if changed 1 times within 1 cycle(s) then alert Timeout = If restarted 3 times within 5 cycle(s) then unmonitor Alert mail to = [email protected] Alert on = All events Alert mail to = [email protected] Alert on = All events ------------------------------------------------------------------------------- monit daemon with PID 5 awakened Ran emerge --sync before emerge -va monit which installed monit v5.3.2. When that didn't work i've downloaded v5.5 from their website and compiled from source which did not work either.

Search Results

Search found 5572 results on 223 pages for 'cpu'.

Page 82/223 | < Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89 | Next Page >

- by Kailash

- by Christian Rau

- by user2968943

- by paulAl

- by Russ

- by tubcvt

- by adriaanp

- by Noctis Skytower

- by kukabunga

- by icemanind

- by Ian Wetherbee

- by jess

- by praveen

- by Dan Snell

- by aaron

- by Anon

- by chnet

- by Qin

- by denson

- by IM

- by dsimcha

- by Thanatos

- by richard

- by richard

- by Yaniro

< Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89 | Next Page >