Search Results

Search found 5945 results on 238 pages for 'green threads'.

Page 97/238 | < Previous Page | 93 94 95 96 97 98 99 100 101 102 103 104  | Next Page >

  • Wait on multiple condition variables on Linux without unnecessary sleeps?

    - by Joseph Garvin
    I'm writing a latency sensitive app that in effect wants to wait on multiple condition variables at once. I've read before of several ways to get this functionality on Linux (apparently this is builtin on Windows), but none of them seem suitable for my app. The methods I know of are: Have one thread wait on each of the condition variables you want to wait on, which when woken will signal a single condition variable which you wait on instead. Cycling through multiple condition variables with a timed wait. Writing dummy bytes to files or pipes instead, and polling on those. #1 & #2 are unsuitable because they cause unnecessary sleeping. With #1, you have to wait for the dummy thread to wake up, then signal the real thread, then for the real thread to wake up, instead of the real thread just waking up to begin with -- the extra scheduler quantum spent on this actually matters for my app, and I'd prefer not to have to use a full fledged RTOS. #2 is even worse, you potentially spend N * timeout time asleep, or your timeout will be 0 in which case you never sleep (endlessly burning CPU and starving other threads is also bad). For #3, pipes are problematic because if the thread being 'signaled' is busy or even crashes (I'm in fact dealing with separate process rather than threads -- the mutexes and conditions would be stored in shared memory), then the writing thread will be stuck because the pipe's buffer will be full, as will any other clients. Files are problematic because you'd be growing it endlessly the longer the app ran. Is there a better way to do this? Curious for answers appropriate for Solaris as well.

    Read the article

  • Python C API from C++ app - know when to lock

    - by Alex
    Hi Everyone, I am trying to write a C++ class that calls Python methods of a class that does some I/O operations (file, stdout) at once. The problem I have ran into is that my class is called from different threads: sometimes main thread, sometimes different others. Obviously I tried to apply the approach for Python calls in multi-threaded native applications. Basically everything starts from PyEval_AcquireLock and PyEval_ReleaseLock or just global locks. According to the documentation here when a thread is already locked a deadlock ensues. When my class is called from the main thread or other one that blocks Python execution I have a deadlock. Python Cfunc1() - C++ func that creates threads internally which lead to calls in "my class", It stuck on PyEval_AcquireLock, obviously the Python is already locked, i.e. waiting for C++ Cfunc1 call to complete... It completes fine if I omit those locks. Also it completes fine when Python interpreter is ready for the next user command, i.e. when thread is calling funcs in the background - not inside of a native call I am looking for a workaround. I need to distinguish whether or not the global lock is allowed, i.e. Python is not locked and ready to receive the next command... I tried PyGIL_Ensure, unfortunately I see hang. Any known API or solution for this ? (Python 2.4)

    Read the article

  • Parallel version of loop not faster than serial version

    - by Il-Bhima
    I'm writing a program in C++ to perform a simulation of particular system. For each timestep, the biggest part of the execution is taking up by a single loop. Fortunately this is embarassingly parallel, so I decided to use Boost Threads to parallelize it (I'm running on a 2 core machine). I would expect at speedup close to 2 times the serial version, since there is no locking. However I am finding that there is no speedup at all. I implemented the parallel version of the loop as follows: Wake up the two threads (they are blocked on a barrier). Each thread then performs the following: Atomically fetch and increment a global counter. Retrieve the particle with that index. Perform the computation on that particle, storing the result in a separate array Wait on a job finished barrier The main thread waits on the job finished barrier. I used this approach since it should provide good load balancing (since each computation may take differing amounts of time). I am really curious as to what could possibly cause this slowdown. I always read that atomic variables are fast, but now I'm starting to wonder whether they have their performance costs. If anybody has some ideas what to look for or any hints I would really appreciate it. I've been bashing my head on it for a week, and profiling has not revealed much.

    Read the article

  • java GC periodically enters into several full GC cycles

    - by Peter
    Environment: sun JDK 1.6.0_16 vm settings: -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -Xms1024 -Xmx1024M -XX:MaxNewSize=448m -XX:NewSize=448m -XX:SurvivorRatio=4(6 also checked) -XX:MaxPermSize=128M OS: windows server 2003 processor: 4 cores of INTEL XEON 5130, 2000 Hz my application description: high intensity of concurrent(java 5 concurrency used) operations completed each time by commit to oracle. it's about 20-30 threads run non stop, doing tasks. application runs in JBOSS web container. My GC starts work normally, I see a lot of small GCs and all that time CPU shows good load, like all 4 cores loaded to 40-50%, CPU graph is stable. Then , after 1 min of good work, CPU starts drop to 0% on 2 cores from 4, it's graph becomes unstable, goes up and down("teeth"). I see, that my threads work slower(I have monitoring), I see that GC starts produce a lot of FULL GC during that time and next 4-5 minutes this situation remains as is, then for short period of time, like 1 minute, it gets back to normal situation, but shortly after that all bad thing repeats. Question: Why I have so frequent full GC??? How to prevent that? I played with SurvivorRatio - does not help. I noticed, that application behaves normally until first FULL GC occurs, while I have enough memory. Then it runs badly. my GC LOG: starts good then long period of FULL GCs(many of them) 1027.861: [GC 942200K-623526K(991232K), 0.0887588 secs] 1029.333: [GC 803279K(991232K), 0.0927470 secs] 1030.551: [GC 967485K-625549K(991232K), 0.0823024 secs] 1030.634: [GC 625957K(991232K), 0.0763656 secs] 1033.126: [GC 969613K-632963K(991232K), 0.0850611 secs] 1033.281: [GC 649899K(991232K), 0.0378358 secs] 1035.910: [GC 813948K(991232K), 0.3540375 secs] 1037.994: [GC 967729K-637198K(991232K), 0.0826042 secs] 1038.435: [GC 710309K(991232K), 0.1370703 secs] 1039.665: [GC 980494K-972462K(991232K), 0.6398589 secs] 1040.306: [Full GC 972462K-619643K(991232K), 3.7780597 secs] 1044.093: [GC 620103K(991232K), 0.0695221 secs] 1047.870: [Full GC 991231K-626514K(991232K), 3.8732457 secs] 1053.739: [GC 942140K(991232K), 0.5410483 secs] 1056.343: [Full GC 991232K-634157K(991232K), 3.9071443 secs] 1061.257: [GC 786274K(991232K), 0.3106603 secs] 1065.229: [Full GC 991232K-641617K(991232K), 3.9565638 secs] 1071.192: [GC 945999K(991232K), 0.5401515 secs] 1073.793: [Full GC 991231K-648045K(991232K), 3.9627814 secs] 1079.754: [GC 936641K(991232K), 0.5321197 secs]

    Read the article

  • Accessing global variable in multithreaded Tomcat server

    - by jwegan
    I have a singleton object that I construct like thus: private static volatile KeyMapper mapper = null; public static KeyMapper getMapper() { if(mapper == null) { synchronized(Utils.class) { if(mapper == null) { mapper = new LocalMemoryMapper(); } } } return mapper; } The class KeyMapper is basically a synchronized wrapper to HashMap with only two functions, one to add a mapping and one to remove a mapping. When running in Tomcat 6.24 on my 32bit Windows machine everything works fine. However when running on a 64 bit Linux machine (CentOS 5.4 with OpenJDK 1.6.0-b09) I add one mapping and print out the size of the HashMap used by KeyMapper to verify the mapping got added (i.e. verify size = 1). Then I try to retrieve the mapping with another request and I keep getting null and when I checked the size of the HashMap it was 0. I'm confident the mapping isn't accidentally being removed since I've commented out all calls to remove (and I don't use clear or any other mutators, just get and put). The requests are going through Tomcat 6.24 (configured to use 200 threads with a minimum of 4 threads) and I passed -Xnoclassgc to the jvm to ensure the class isn't inadvertently getting garbage collected (jvm is also running in -server mode). I also added a finalize method to KeyMapper to print to stderr if it ever gets garbage collected to verify that it wasn't being garbage collected. I'm at my wits end and I can't figure out why one minute the entry in HashMap is there and the next it isn't :(

    Read the article

  • GWT RPC and GoDaddy Shared Hosting

    - by Mike Apolis
    Hi, I've deployed the sample Stock Watcher app to my GoDaddy Hosting site, and I get the error below. I've tried compiling the Project in Eclipse with JRE 1.5 because my Host is using jre 1.5. I think the issue is the "gwt-servlet.jar" is not compatible with jre 1.5. Can anyone confirm this. The project runs fine on my local machine using JRE 1.6. Unfortunately GoDaddy will not upgrade my shared hosting account jre to 1.6. GoDaddy Server Setup: Tomcat Version 5.0.27 JRE 1.5_22 Error: HTTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception javax.servlet.ServletException: Error allocating a servlet instance org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 117) org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 535) org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java: 417) org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java: 160) org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:300) org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:374) org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:743) org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java: 675) org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:866) org.apache.tomcat.util.threads.ThreadPool $ControlRunnable.run(ThreadPool.java:683) java.lang.Thread.run(Thread.java:595) root cause java.lang.UnsupportedClassVersionError: Bad version number in .class file java.lang.ClassLoader.defineClass1(Native Method) java.lang.ClassLoader.defineClass(ClassLoader.java:621) java.security.SecureClassLoader.defineClass(SecureClassLoader.java: 124) org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java: 1634) org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java: 860) org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java: 1307) org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java: 1189) java.security.AccessController.doPrivileged(Native Method) org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: 117) org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 535) org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java: 417) org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java: 160) org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:300) org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:374) org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:743) org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java: 675) org.apache.jk.common.SocketConnection.runIt(ChannelSocket.java:866) org.apache.tomcat.util.threads.ThreadPool $ControlRunnable.run(ThreadPool.java:683) java.lang.Thread.run(Thread.java:595) note The full stack trace of the root cause is available in the Apache Tomcat/5.0.27 logs. Apache Tomcat/5.0.27

    Read the article

  • Java Concurrency : Volatile vs final in "cascaded" variables?

    - by Tom
    Hello Experts, is final Map<Integer,Map<String,Integer>> status = new ConcurrentHashMap<Integer, Map<String,Integer>>(); Map<Integer,Map<String,Integer>> statusInner = new ConcurrentHashMap<Integer, Map<String,Integer>>(); status.put(key,statusInner); the same as volatile Map<Integer,Map<String,Integer>> status = new ConcurrentHashMap<Integer, Map<String,Integer>>(); Map<Integer,Map<String,Integer>> statusInner = new ConcurrentHashMap<Integer, Map<String,Integer>>(); status.put(key,statusInner); in case the inner Map is accessed by different Threads? or is even something like this required: volatile Map<Integer,Map<String,Integer>> status = new ConcurrentHashMap<Integer, Map<String,Integer>>(); volatile Map<Integer,Map<String,Integer>> statusInner = new ConcurrentHashMap<Integer, Map<String,Integer>>(); status.put(key,statusInner); In case the it is NOT a "cascaded" map, final and volatile have in the end the same effect of making shure that all threads see always the correct contents of the Map... But what happens if the Map iteself contains a map, as in the example... How do I make shure that the inner Map is correctly "Memory barriered"? Tanks! Tom

    Read the article

  • high cpu in IIS

    - by Miki Watts
    Hi all. I'm developing a POS application that has a local database on each POS computer, and communicates with the server using WCF hosted in IIS. The application has been deployed in several customers for over a year now. About a week ago, we've started getting reports from one of our customers that the server that the IIS is hosted on is very slow. When I've checked the issue, I saw the application pool with my process rocket to almost 100% cpu on an 8 cpu server. I've checked the SQL Activity Monitor and network volume, and they showed no significant overload beyond what we usually see. When checking the threads in Process Explorer, I saw lots of threads repeatedly calling CreateApplicationContext. I've tried installing .Net 2.0 SP1, according to some posts I found on the net, but it didn't solve the problem and replaced the function calls with CLRCreateManagedInstance. I'm about to capture a dump using adplus and windbg of the IIS processes and try to figure out what's wrong. Has anyone encountered something like this or has an idea which directory I should check ? p.s. The same version of the application is deployed in another customer, and there it works just fine. I also tried rolling back versions (even very old versions) and it still behaves exactly the same. Edit: well, problem solved, turns out I've had an SQL query in there that didn't limit the result set, and when the customer went over a certain number of rows, it started bogging down the server. Took me two days to find it, because of all the surrounding noise in the logs, but I waited for the night and took a dump then, which immediately showed me the query.

    Read the article

  • Create a Task list, with tasks without executing

    - by Ernesto Araya Eguren
    I have an async method private async Task DoSomething(CancellationToken token) a list of Tasks private List<Task> workers = new List<Task>(); and I have to create N threads that runs that method public void CreateThreads(int n) { tokenSource = new CancellationTokenSource(); token = tokenSource.Token; for (int i = 0; i < n; i++) { workers.Add(DoSomething(token)); } } but the problem is that those have to run at a given time public async Task StartAllWorkers() { if (0 < workers.Count) { try { while (0 < workers.Count) { Task finishedWorker = await Task.WhenAny(workers.ToArray()); workers.Remove(finishedWorker); finishedWorker.Dispose(); } if (workers.Count == 0) { tokenSource = null; } } catch (OperationCanceledException) { throw; } } } but actually they run when i call the CreateThreads Method (before the StartAllWorkers). I searched for keywords and problems like mine but couldn't find anything about stopping the task from running. I've tried a lot of different aproaches but anything that could solve my problem entirely. For example, moving the code from DoSomething into a workers.Add(new Task(async () => { }, token)); would run the StartAllWorkers(), but the threads will never actually start. There is another method for calling the tokenSource.Cancel().

    Read the article

  • How do I rewrite a for loop with a shared dependency using actors

    - by Thomas Rynne
    We have some code which needs to run faster. Its already profiled so we would like to make use of multiple threads. Usually I would setup an in memory queue, and have a number of threads taking jobs of the queue and calculating the results. For the shared data I would use a ConcurrentHashMap or similar. I don't really want to go down that route again. From what I have read using actors will result in cleaner code and if I use akka migrating to more than 1 jvm should be easier. Is that true? However, I don't know how to think in actors so I am not sure where to start. To give a better idea of the problem here is some sample code: case class Trade(price:Double, volume:Int, stock:String) { def value(priceCalculator:PriceCalculator) = (priceCalculator.priceFor(stock)-> price)*volume } class PriceCalculator { def priceFor(stock:String) = { Thread.sleep(20)//a slow operation which can be cached 50.0 } } object ValueTrades { def valueAll(trades:List[Trade], priceCalculator:PriceCalculator):List[(Trade,Double)] = { trades.map { trade => (trade,trade.value(priceCalculator)) } } def main(args:Array[String]) { val trades = List( Trade(30.5, 10, "Foo"), Trade(30.5, 20, "Foo") //usually much longer ) val priceCalculator = new PriceCalculator val values = valueAll(trades, priceCalculator) } } I'd appreciate it if someone with experience using actors could suggest how this would map on to actors.

    Read the article

  • Achieving Thread-Safety

    - by Smasher
    Question How can I make sure my application is thread-safe? Are their any common practices, testing methods, things to avoid, things to look for? Background I'm currently developing a server application that performs a number of background tasks in different threads and communicates with clients using Indy (using another bunch of automatically generated threads for the communication). Since the application should be highly availabe, a program crash is a very bad thing and I want to make sure that the application is thread-safe. No matter what, from time to time I discover a piece of code that throws an exception that never occured before and in most cases I realize that it is some kind of synchronization bug, where I forgot to synchronize my objects properly. Hence my question concerning best practices, testing of thread-safety and things like that. mghie: Thanks for the answer! I should perhaps be a little bit more precise. Just to be clear, I know about the principles of multithreading, I use synchronization (monitors) throughout my program and I know how to differentiate threading problems from other implementation problems. But nevertheless, I keep forgetting to add proper synchronization from time to time. Just to give an example, I used the RTL sort function in my code. Looked something like FKeyList.Sort (CompareKeysFunc); Turns out, that I had to synchronize FKeyList while sorting. It just don't came to my mind when initially writing that simple line of code. It's these thins I wanna talk about. What are the places where one easily forgets to add synchronization code? How do YOU make sure that you added sync code in all important places?

    Read the article

  • How to detect that the internet connection has got disconnected through a java desktop application?

    - by Yatendra Goel
    I am developing a Java Desktop Application that access internet. It is a multi-threaded application, each thread do the same work (means each thread is an instance of same Thread class). Now, as all the threads need internet connection to be active, there should be some mechanism that detects whether an internet connection is active or not. Q1. How to detect whether the internet connection is active or not? Q2. Where to implement this internet-status-check-mechanism code? Should I start a separate thread for checking internet status regularly and notifies all the threads when the status changes from one state to another? Or should I let each thread check for the internet-status itself? Q3. This issue should be a very common issue as every application accessing an internet should deal with this problem. So how other developers usually deal with this problem? Q4. If you could give me a reference to a good demo application that addresses this issue then it would greatly help me.

    Read the article

  • Boost Asio UDP retrieve last packet in socket buffer

    - by Alberto Toglia
    I have been messing around Boost Asio for some days now but I got stuck with this weird behavior. Please let me explain. Computer A is sending continuos udp packets every 500 ms to computer B, computer B desires to read A's packets with it own velocity but only wants A's last packet, obviously the most updated one. It has come to my attention that when I do a: mSocket.receive_from(boost::asio::buffer(mBuffer), mEndPoint); I can get OLD packets that were not processed (almost everytime). Does this make any sense? A friend of mine told me that sockets maintain a buffer of packets and therefore If I read with a lower frequency than the sender this could happen. ¡? So, the first question is how is it possible to receive the last packet and discard the ones I missed? Later I tried using the async example of the Boost documentation but found it did not do what I wanted. http://www.boost.org/doc/libs/1_36_0/doc/html/boost_asio/tutorial/tutdaytime6.html From what I could tell the async_receive_from should call the method "handle_receive" when a packet arrives, and that works for the first packet after the service was "run". If I wanted to keep listening the port I should call the async_receive_from again in the handle code. right? BUT what I found is that I start an infinite loop, it doesn't wait till the next packet, it just enters "handle_receive" again and again. I'm not doing a server application, a lot of things are going on (its a game), so my second question is, do I have to use threads to use the async receive method properly, is there some example with threads and async receive? Thanks for you attention.

    Read the article

  • Fastest inline-assembly spinlock

    - by sigvardsen
    I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units Ive also tried to use some inline assembly with __asm {} using something like this code and it scores about 70 time units, but I am not sure that a proper memory barrier has been created. Edit: The times given here are the time it takes for 2 threads to lock and unlock the spinlock 1,000,000 times. I know this isn't a lot of difference but as a spinlock is a heavily used object, one would think that programmers would have agreed on the fastest possible way to make a spinlock. Googling it leads to many different approaches however. I would think this aforementioned method would be the fastest if implemented using inline assembly and using the instruction CMPXCHG8B instead of comparing 32bit registers. Furthermore memory barriers must be taken into account, this could be done by LOCK CMPXHG8B (I think?), which guarantees "exclusive rights" to the shared memory between cores. At last [some suggests] that for busy waits should be accompanied by NOP:REP that would enable Hyper-threading processors to switch to another thread, but I am not sure whether this is true or not? From my performance-test of different spinlocks, it is seen that there is not much difference, but for purely academic purpose I would like to know which one is fastest. However as I have extremely limited experience in the assembly-language and with memory barriers, I would be happy if someone could write the assembly code for the last example I provided with LOCK CMPXCHG8B and proper memory barriers in the following template: __asm { spin_lock: ;locking code. spin_unlock: ;unlocking code. }

    Read the article

  • How do you prevent Git from printing 'remote:' on each line of the output of a post-recieve hook?

    - by Matt Hodan
    I recently configured an EC2 instance with a Git deployment workflow that resembles Heroku, but I can't seem to figure out how Heroku prevents the Git post-receive hook from outputting 'remote:' on each line. Consider the following two examples (one from my EC2 project and one from a Heroku project): My EC2 project: git push prod master Counting objects: 9, done. Delta compression using up to 2 threads. Compressing objects: 100% (5/5), done. Writing objects: 100% (5/5), 456 bytes, done. Total 5 (delta 3), reused 0 (delta 0) remote: remote: Receiving push remote: Deploying updated files (by resetting HEAD) remote: HEAD is now at bf17da8 test commit remote: Running bundler to install gem dependencies remote: Fetching source index for http://rubygems.org/ remote: Installing rake (0.8.7) remote: Installing abstract (1.0.0) ... remote: Installing railties (3.0.0) remote: Installing rails (3.0.0) remote: Your bundle is complete! It was installed into ./.bundle/gems remote: Launching (by restarting Passenger)... done remote: To ssh://[email protected]/~/apps/app_name e8bd06f..bf17da8 master -> master Heroku: $> git push heroku master Counting objects: 179, done. Delta compression using up to 2 threads. Compressing objects: 100% (89/89), done. Writing objects: 100% (105/105), 42.70 KiB, done. Total 105 (delta 53), reused 0 (delta 0) -----> Heroku receiving push -----> Rails app detected -----> Gemfile detected, running Bundler version 1.0.3 Unresolved dependencies detected; Installing... Using --without development:test Fetching source index for http://rubygems.org/ Installing rake (0.8.7) Installing abstract (1.0.0) ... Installing railties (3.0.0) Installing rails (3.0.0) Your bundle is complete! It was installed into ./.bundle/gems Compiled slug size is 4.8MB -----> Launching... done http://your_app_name.heroku.com deployed to Heroku To [email protected]:your_app_name.git 3bf6e8d..642f01a master -> master

    Read the article

  • Parallel Tasking Concurrency with Dependencies on Python like GNU Make

    - by Brian Bruggeman
    I'm looking for a method or possibly a philosophical approach for how to do something like GNU Make within python. Currently, we utilize makefiles to execute processing because the makefiles are extremely good at parallel runs with changing single option: -j x. In addition, gnu make already has the dependency stacks built into it, so adding a secondary processor or the ability to process more threads just means updating that single option. I want that same power and flexibility in python, but I don't see it. As an example: all: dependency_a dependency_b dependency_c dependency_a: dependency_d stuff dependency_b: dependency_d stuff dependency_c: dependency_e stuff dependency_d: dependency_f stuff dependency_e: stuff dependency_f: stuff If we do a standard single thread operation (-j 1), the order of operation might be: dependency_f -> dependency_d -> dependency_a -> dependency_b -> dependency_e \ -> dependency_c For two threads (-j 2), we might see: 1: dependency_f -> dependency_d -> dependency_a -> dependency_b 2: dependency_e -> dependency_c Does anyone have any suggestions on either a package already built or an approach? I'm totally open, provided it's a pythonic solution/approach. Please and Thanks in advance!

    Read the article

  • Testing approach for multi-threaded software

    - by Shane MacLaughlin
    I have a piece of mature geospatial software that has recently had areas rewritten to take better advantage of the multiple processors available in modern PCs. Specifically, display, GUI, spatial searching, and main processing have all been hived off to seperate threads. The software has a pretty sizeable GUI automation suite for functional regression, and another smaller one for performance regression. While all automated tests are passing, I'm not convinced that they provide nearly enough coverage in terms of finding bugs relating race conditions, deadlocks, and other nasties associated with multi-threading. What techniques would you use to see if such bugs exist? What techniques would you advocate for rooting them out, assuming there are some in there to root out? What I'm doing so far is running the GUI functional automation on the app running under a debugger, such that I can break out of deadlocks and catch crashes, and plan to make a bounds checker build and repeat the tests against that version. I've also carried out a static analysis of the source via PC-Lint with the hope of locating potential dead locks, but not had any worthwhile results. The application is C++, MFC, mulitple document/view, with a number of threads per doc. The locking mechanism I'm using is based on an object that includes a pointer to a CMutex, which is locked in the ctor and freed in the dtor. I use local variables of this object to lock various bits of code as required, and my mutex has a time out that fires my a warning if the timeout is reached. I avoid locking where possible, using resource copies where possible instead. What other tests would you carry out?

    Read the article

  • Time with and without OpenMP

    - by was
    I have a question.. I tried to improve a well known program algorithm in C, FOX algorithm for matrix multiplication.. relative link without openMP: (http://web.mst.edu/~ercal/387/MPI/ppmpi_c/chap07/fox.c). The initial program had only MPI and I tried to insert openMP in the matrix multiplication method, in order to improve the time of computation: (This program runs in a cluster and computers have 2 cores, thus I created 2 threads.) The problem is that there is no difference of time, with and without openMP. I observed that using openMP sometimes, time is equivalent or greater than the time without openMP. I tried to multiply two 600x600 matrices. void Local_matrix_multiply( LOCAL_MATRIX_T* local_A /* in */, LOCAL_MATRIX_T* local_B /* in */, LOCAL_MATRIX_T* local_C /* out */) { int i, j, k; chunk = CHUNKSIZE; // 100 #pragma omp parallel shared(local_A, local_B, local_C, chunk, nthreads) private(i,j,k,tid) num_threads(2) { /* tid = omp_get_thread_num(); if(tid == 0){ nthreads = omp_get_num_threads(); printf("O Pollaplasiamos pinakwn ksekina me %d threads\n", nthreads); } printf("Thread %d use the matrix: \n", tid); */ #pragma omp for schedule(static, chunk) for (i = 0; i < Order(local_A); i++) for (j = 0; j < Order(local_A); j++) for (k = 0; k < Order(local_B); k++) Entry(local_C,i,j) = Entry(local_C,i,j) + Entry(local_A,i,k)*Entry(local_B,k,j); } //end pragma omp parallel } /* Local_matrix_multiply */

    Read the article

  • Parallel.For Batching

    - by chibacity
    Is there built-in support in the TPL for batching operations? I was recently playing with a routine to carry out character replacement on a character array which required a lookup table i.e. transliteration: for (int i = 0; i < chars.Length; i++) { char replaceChar; if (lookup.TryGetValue(chars[i], out replaceChar)) { chars[i] = replaceChar; } } I could see that this could be trivially parallelized, so jumped in with a first stab which I knew would perform worse as the tasks were too fine-grained: Parallel.For(0, chars.Length, i => { char replaceChar; if (lookup.TryGetValue(chars[i], out replaceChar)) { chars[i] = replaceChar; } }); I then reworked the algorithm to use batching so that the work could be chunked onto different threads in less fine-grained batches. This made use of threads as expected and I got some near linear speed up. I'm sure that there must be built-in support for batching in the TPL. What is the syntax, and how do I use it? const int CharBatch = 100; int charLen = chars.Length; Parallel.For(0, ((charLen / CharBatch) + 1), i => { int batchUpper = ((i + 1) * CharBatch); for (int j = i * CharBatch; j < batchUpper && j < charLen; j++) { char replaceChar; if (lookup.TryGetValue(chars[j], out replaceChar)) { chars[j] = replaceChar; } } });

    Read the article

  • What are some of the core principles needed to master Multi threading using Delphi?

    - by Gary Becks
    I am kind of new to programming in general (about 8 months with on and off in delphi and a little python here and there) and I am in the process of buying some books. I am interested in learning about concurrent programming and building multi threaded apps using Delphi. Whenever I do a search for "multithreading delphi" or "delphi multithreading tutorial" I seem to get conflicting results as some of the stuff is about using certain libraries (omnithread library) and other stuff seems to be more geared towards programmers with more experience. I have studied quite a few books on delphi and for the most part they seem to kind of skim the surface and not really go into depth on the subject. I have a friend who is a programmer (he uses c++) who recommends I learn what is actually going on with the underlying system when using threads as opposed to jumping into how to actually implement them in my programs first. On amazon.com there are quite a few books on concurrent programming but none of them seem to be made with Delphi in mind. Basically I need to know what are the main things I should be focused on learning before jumping into using threads, if I can/should attempt to learn them using books that are not specifically aimed at delphi developers (don't want to confuse myself reading books with a bunch of code examples in other languages right now) and if there are any reliable resources/books on the subject that anyone here could recommend. Thanks in advance.

    Read the article

  • Google AppEngine + Local JUnit Tests + Jersey framework + Embedded Jetty

    - by xamde
    I use Google Appengine for Java (GAE/J). On top, I use the Jersey REST-framework. Now i want to run local JUnit tests. The test sets up the local GAE development environment ( http://code.google.com/appengine/docs/java/tools/localunittesting.html ), launches an embedded Jetty server, and then fires requests to the server via HTTP and checks responses. Unfortunately, the Jersey/Jetty combo spawns new threads. GAE expects only one thread to run. In the end, I end up having either no datstore inside the Jersey-resources or multiple, having different datastore. As a workaround I initialise the GAE local env only once, put it in a static variable and inside the GAE resource I add many checks (This threads has no dev env? Re-use the static one). And these checks should of course only run inside JUnit tests.. (which I asked before: "How can I find out if code is running inside a JUnit test or not?" - I'm not allowed to post the link directly here :-|)

    Read the article

  • CUDA - multiple kernels to compute a single value

    - by Roger
    Hey, I'm trying to write a kernel to essentially do the following in C float sum = 0.0; for(int i = 0; i < N; i++){ sum += valueArray[i]*valueArray[i]; } sum += sum / N; At the moment I have this inside my kernel, but it is not giving correct values. int i0 = blockIdx.x * blockDim.x + threadIdx.x; for(int i=i0; i<N; i += blockDim.x*gridDim.x){ *d_sum += d_valueArray[i]*d_valueArray[i]; } *d_sum= __fdividef(*d_sum, N); The code used to call the kernel is kernelName<<<64,128>>>(N, d_valueArray, d_sum); cudaMemcpy(&sum, d_sum, sizeof(float) , cudaMemcpyDeviceToHost); I think that each kernel is calculating a partial sum, but the final divide statement is not taking into account the accumulated value from each of the threads. Every kernel is producing it's own final value for d_sum? Does anyone know how could I go about doing this in an efficient way? Maybe using shared memory between threads? I'm very new to GPU programming. Cheers

    Read the article

  • Is memory allocation in linux non-blocking?

    - by Mark
    I am curious to know if the allocating memory using a default new operator is a non-blocking operation. e.g. struct Node { int a,b; }; ... Node foo = new Node(); If multiple threads tried to create a new Node and if one of them was suspended by the OS in the middle of allocation, would it block other threads from making progress? The reason why I ask is because I had a concurrent data structure that created new nodes. I then modified the algorithm to recycle the nodes. The throughput performance of the two algorithms was virtually identical on a 24 core machine. However, I then created an interference program that ran on all the system cores in order to create as much OS pre-emption as possible. The throughput performance of the algorithm that created new nodes decreased by a factor of 5 relative the the algorithm that recycled nodes. I'm curious to know why this would occur. Thanks. *Edit : pointing me to the code for the c++ memory allocator for linux would be helpful as well. I tried looking before posting this question, but had trouble finding it.

    Read the article

  • How can I create a DOTNET COM interop assembly for Classic ASP that does not sequentially block othe

    - by Alex Waddell
    Setup -- Create a simple COM addin through DOTNET/C# that does nothing but sleep on the current thread for 5 seconds. namespace ComTest { [ComVisible(true)] [ProgId("ComTester.Tester")] [Guid("D4D0BF9C-C169-4e5f-B28B-AFA194B29340")] [ClassInterface(ClassInterfaceType.AutoDual)] public class Tester { [STAThread()] public string Test() { System.Threading.Thread.Sleep(5000); return DateTime.Now.ToString(); } } } From an ASP page, call the test component: <%@ Language=VBScript %> <%option explicit%> <%response.Buffer=false%> <% dim test set test = CreateObject("ComTester.Tester") %> <HTML> <HEAD></HEAD> <BODY> <% Response.Write(test.Test()) set test = nothing %> </BODY> </HTML> When run on a windows 2003 server, the test.asp page blocks ALL OTHER threads in the site while the COM components sleeps. How can I create a COM component for ASP that does not block all ASP worker threads?

    Read the article

  • Which isolation level should I use for the following insert-if-not-present transaction?

    - by Steve Guidi
    I've written a linq-to-sql program that essentially performs an ETL task, and I've noticed many places where parallelization will improve its performance. However, I'm concerned about preventing uniquness constraint violations when two threads perform the following task (psuedo code). Record CreateRecord(string recordText) { using (MyDataContext database = GetDatabase()) { Record existingRecord = database.MyTable.FirstOrDefault(record.KeyPredicate()); if(existingRecord == null) { existingRecord = CreateRecord(recordText); database.MyTable.InsertOnSubmit(existingRecord); } database.SubmitChanges(); return existingRecord; } } In general, this code executes a SELECT statement to test for record existance, followed by an INSERT statement if the record doesn't exist. It is encapsulated by an implicit transaction. When two threads run this code for the same instance of recordText, I want to prevent them from simultaneously determining that the record doesn't exist, thereby both attempting to create the same record. An isolation level and explicit transaction will work well, except I'm not certain which isolation level I should use -- Serializable should work, but seems too strict. Is there a better choice?

    Read the article

< Previous Page | 93 94 95 96 97 98 99 100 101 102 103 104  | Next Page >