Search Results

Search found 3641 results on 146 pages for 'threads'.

Page 38/146 | < Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45  | Next Page >

  • how to implement a "soft barrier" in multithreaded c++

    - by Jason
    I have some multithreaded c++ code with the following structure: do_thread_specific_work(); update_shared_variables(); //checkpoint A do_thread_specific_work_not_modifying_shared_variables(); //checkpoint B do_thread_specific_work_requiring_all_threads_have_updated_shared_variables(); What follows checkpoint B is work that could have started if all threads have reached only checkpoint A, hence my notion of a "soft barrier". Typically multithreading libraries only provide "hard barriers" in which all threads must reach some point before any can continue. Obviously a hard barrier could be used at checkpoint B. Using a soft barrier can lead to better execution time, especially since the work between checkpoints A and B may not be load-balanced between the threads (i.e. 1 slow thread who has reached checkpoint A but not B could be causing all the others to wait at the barrier just before checkpoint B). I've tried using atomics to synchronize things and I know with 100% certainty that is it NOT guaranteed to work. For example using openmp syntax, before the parallel section start with: shared_thread_counter = num_threads; //known at compile time #pragma omp flush Then at checkpoint A: #pragma omp atomic shared_thread_counter--; Then at checkpoint B (using polling): #pragma omp flush while (shared_thread_counter > 0) { usleep(1); //can be removed, but better to limit memory bandwidth #pragma omp flush } I've designed some experiments in which I use an atomic to indicate that some operation before it is finished. The experiment would work with 2 threads most of the time but consistently fail when I have lots of threads (like 20 or 30). I suspect this is because of the caching structure of modern CPUs. Even if one thread updates some other value before doing the atomic decrement, it is not guaranteed to be read by another thread in that order. Consider the case when the other value is a cache miss and the atomic decrement is a cache hit. So back to my question, how to CORRECTLY implement this "soft barrier"? Is there any built-in feature that guarantees such functionality? I'd prefer openmp but I'm familiar with most of the other common multithreading libraries. As a workaround right now, I'm using a hard barrier at checkpoint B and I've restructured my code to make the work between checkpoint A and B automatically load-balancing between the threads (which has been rather difficult at times). Thanks for any advice/insight :)

    Read the article

  • mod_wsgi daemon mode vs threaded fastcgi

    - by t0ster
    Can someone explain the difference between apache mod_wsgi in daemon mode and django fastcgi in threaded mode. They both use threads for concurrency I think. Supposing that I'm using nginx as front end to apache mod_wsgi. UPDATE: I'm comparing django built in fastcgi(./manage.py method=threaded maxchildren=15) and mod_wsgi in 'daemon' mode(WSGIDaemonProcess example threads=15). They both use threads and acquire GIL, am I right?

    Read the article

  • what happens to running/blocked runnables when executorservice is shutdown()

    - by prmatta
    I posted a question about a thread pattern today, and almost everyone suggested that I look into the ExecutorService. While I was looking into the ExecutorService, I think I am missing something. What happens if the service has a running or blocked threads, and someone calls ExecutorService.shutdown(). What happens to threads that are running or blocked? Does the ExecutorService wait for those threads to complete before it terminates? The reason I ask this is because a long time ago when I used to dabble in Java, they deprecated Thread.stop(), and I remember the right way of stopping a thread was to use sempahores and extend Thread when necessary: public void run () { while (!this.exit) { try { block(); //do something } catch (InterruptedException ie) { } } } public void stop () { this.exit = true; if (this.thread != null) { this.thread.interrupt(); this.thread = null; } } How does ExecutorService handle running threads?

    Read the article

  • An Actor "queue" ?

    - by synic
    In Java, to write a library that makes requests to a server, I usually implement some sort of dispatcher (not unlike the one found here in the Twitter4J library: http://github.com/yusuke/twitter4j/blob/master/twitter4j-core/src/main/java/twitter4j/internal/async/DispatcherImpl.java) to limit the number of connections, to perform asynchronous tasks, etc. The idea is that N number of threads are created. A "Task" is queued and all threads are notified, and one of the threads, when it's ready, will pop an item from the queue, do the work, and then return to a waiting state. If all the threads are busy working on a Task, then the Task is just queued, and the next available thread will take it. This keeps the max number of connections to N, and allows at most N Tasks to be operating at the same time. I'm wondering what kind of system I can create with Actors that will accomplish the same thing? Is there a way to have N number of Actors, and when a new message is ready, pass it off to an Actor to handle it - and if all Actors are busy, just queue the message?

    Read the article

  • How to parallelize this groovy code?

    - by lucas
    I'm trying to write a reusable component in Groovy to easily shoot off emails from some of our Java applications. I would like to pass it a List, where Email is just a POJO(POGO?) with some email info. I'd like it to be multithreaded, at least running all the email logic in a second thread, or make one thread per email. I am really foggy on multithreading in Java so that probably doesn't help! I've attempted a few different ways, but here is what I have right now: void sendEmails(List<Email> emails) { def threads = [] def sendEm = emails.each{ email -> def th = new Thread({ Random rand = new Random() def wait = (long)(rand.nextDouble() * 1000) println "in closure" this.sleep wait sendEmail(email) }) println "putting thread in list" threads << th } threads.each { it.run() } threads.each { it.join() } } I was hoping the sleep would randomly slow some threads down so the console output wouldn't be sequential. Instead, I see this: putting thread in list putting thread in list putting thread in list putting thread in list putting thread in list putting thread in list putting thread in list putting thread in list putting thread in list putting thread in list in closure sending email1 in closure sending email2 in closure sending email3 in closure sending email4 in closure sending email5 in closure sending email6 in closure sending email7 in closure sending email8 in closure sending email9 in closure sending email10 sendEmail basically does what you'd expect, including the println statement, and the client that calls this follows, void doSomething() { Mailman emailer = MailmanFactory.getExchangeEmailer() def to = ["one","two"] def from = "noreply" def li = [] def email (1..10).each { email = new Email(to,null,from,"email"+it,"hello") li << email } emailer.sendEmails li }

    Read the article

  • Caveats to be aware of when using threading in Python?

    - by knorv
    I'm quite new to threading in Python and have a couple of beginner questions. When starting more than say fifty threads using the Python threading module I start getting MemoryError. The threads themselves are very slim and not very memory hungry, so it seems like it is the overhead of the threading that causes the memory issues. Is there something I can do to increase the memory capacity or otherwise make Python allow for a larger number of threads? What is the maximum number of threads you've been able to run in your Python code using the threading module? Did you do any tricks to achieve that number? Are there any other caveats to be aware of when using the threading module?

    Read the article

  • Why is my code stopping and not returning an exception?

    - by BeckyLou
    I have some code that starts a couple of threads to let them execute, then uses a while loop to check for the current time passing a set timeout period, or for the correct number of results to have been processed (by checking an int on the class object) (with a Thread.Sleep() to wait between loops) Once the while loop is set to exit, it calls Abort() on the threads and should return data to the function that calls the method. When debugging and stepping through the code, I find there can be exceptions in the code running on the separate threads, and in some cases I handle these appropriately, and at other times I don't want to do anything specific. What I have been seeing is that my code goes into the while loop and the thread sleeps, then nothing is returned from my function, either data or an exception. Code execution just stops completely. Any ideas what could be happening? Code sample: System.Threading.Thread sendThread = new System.Threading.Thread(new System.Threading.ThreadStart(Send)); sendThread.Start(); System.Threading.Thread receiveThread = new System.Threading.Thread(new System.Threading.ThreadStart(Receive)); receiveThread.Start(); // timeout Int32 maxSecondsToProcess = this.searchTotalCount * timeout; DateTime timeoutTime = DateTime.Now.AddSeconds(maxSecondsToProcess); Log("Submit() Timeout time: " + timeoutTime.ToString("yyyyMMdd HHmmss")); // while we're still waiting to receive results & haven't hit the timeout, // keep the threads going while (resultInfos.Count < this.searchTotalCount && DateTime.Now < timeoutTime) { Log("Submit() Waiting..."); System.Threading.Thread.Sleep(10 * 1000); // 1 minute } Log("Submit() Aborting threads"); // <== this log doesn't show up sendThread.Abort(); receiveThread.Abort(); return new List<ResultInfo>(this.resultInfos.Values);

    Read the article

  • Recommendations for IPC between parent and child processes in .NET?

    - by Jeremy
    My .NET program needs to run an algorithm that makes heavy use of 3rd party libraries (32-bit), most of which are unmanaged code. I want to drive the CPU as hard as I can, so the code runs several threads in parallel to divide up the work. I find that running all these threads simultaneously results in temporary memory spikes, causing the process' virtual memory size to approach the 2 GB limit. This memory is released back pretty quickly, but occasionally if enough threads enter the wrong sections of code at once, the process crosses the "red line" and either the unmanaged code or the .NET code encounters an out of memory error. I can throttle back the number of threads but then my CPU usage is not as high as I would like. I am thinking of creating worker processes rather than worker threads to help avoid the out of memory errors, since doing so would give each thread of execution its own 2 GB of virtual address space (my box has lots of RAM). I am wondering what are the best/easiest methods to communicate the input and output between the processes in .NET? The file system is an obvious choice. I am used to shared memory, named pipes, and such from my UNIX background. Is there a Windows or .NET specific mechanism I should use?

    Read the article

  • How to buffer stdout in memory and write it from a dedicated thread

    - by NickB
    I have a C application with many worker threads. It is essential that these do not block so where the worker threads need to write to a file on disk, I have them write to a circular buffer in memory, and then have a dedicated thread for writing that buffer to disk. The worker threads do not block any more. The dedicated thread can safely block while writing to disk without affecting the worker threads (it does not hold a lock while writing to disk). My memory buffer is tuned to be sufficiently large that the writer thread can keep up. This all works great. My question is, how do I implement something similar for stdout? I could macro printf() to write into a memory buffer, but I don't have control over all the code that might write to stdout (some of it is in third-party libraries). Thoughts? NickB

    Read the article

  • Killing a script launched in a Process via os.system()

    - by L.J.
    I have a python script which launches several processes. Each process basically just calls a shell script: from multiprocessing import Process import os import logging def thread_method(n = 4): global logger command = "~/Scripts/run.sh " + str(n) + " >> /var/log/mylog.log" if (debug): logger.debug(command) os.system(command) I launch several of these threads, which are meant to run in the background. I want to have a timeout on these threads, such that if it exceeds the timeout, they are killed: t = [] for x in range(10): try: t.append(Process(target=thread_method, args=(x,) ) ) t[-1].start() except Exception as e: logger.error("Error: unable to start thread") logger.error("Error message: " + str(e)) logger.info("Waiting up to 60 seconds to allow threads to finish") t[0].join(60) for n in range(len(t)): if t[n].is_alive(): logger.info(str(n) + " is still alive after 60 seconds, forcibly terminating") t[n].terminate() The problem is that calling terminate() on the process threads isn't killing the launched run.sh script - it continues running in the background until I either force kill it from the command line, or it finishes internally. Is there a way to have terminate also kill the subshell created by os.system()?

    Read the article

  • When we should use NSThreads in a cocoa Touch ?

    - by srikanth rongali
    I am writing a small game by using cocos2d. It is a shooting game. Player on one side and enemy on other side. To run the both actions of player shooting and enemy shooting do we should use threads ? Or can we do without using them. At present I am not using threads. But I can manage to do both actions of player and enemy at same time. Should I use threads compulsory good performance ? Or am I doing wrong without using threads ? Please help me from this confusion. Thank you.

    Read the article

  • OpenThread() through diiferent thread numbers ?

    - by user354641
    Hi there . I'm confusing about opening different threads with OpenThread Function and examining them with NtQueryInformationThread native function . I have no problem with NtQueryInformationThread & I can examine them finely. the problem is I don't know how to loop through different number of threads using OpenThread (with SetDebugPrivilege Consideration) . suppose we have different threads from number 5100 to 5200 & we want to examine them sequentially : for example 5100, 5101, 5102, 5103, 5104, 5105 ... 5200 ... . I don't know how to use OpenThread Function in delphi in right way ... . I'm using this syntax & I found it wrong : OpenThread(THREAD_ALL_ACCESS,false,(DWORD)5100) . if anyone could guide me how to use OpenThread though different number of threads it would be great . thanks alot .

    Read the article

  • Noise with multi-threaded raytracer

    - by herber88
    This is my first multi-threaded implementation, so it's probably a beginners mistake. The threads handle the rendering of every second row of pixels (so all rendering is handled within each thread). The problem persists if the threads render the upper and lower parts of the screen respectively. Both threads read from the same variables, can this cause any problems? From what I've understood only writing can cause concurrency problems... Can calling the same functions cause any concurrency problems? And again, from what I've understood this shouldn't be a problem... The only time both threads write to the same variable is when saving the calculated pixel color. This is stored in an array, but they never write to the same indices in that array. Can this cause a problem? Multi-threaded rendered image (Spam prevention stops me from posting images directly..) Ps. I use the exactly same implementation in both cases, the ONLY difference is a single vs. two threads created for the rendering.

    Read the article

  • Semaphore - What is the use of initial count?

    - by Sandbox
    http://msdn.microsoft.com/en-us/library/system.threading.semaphoreslim.aspx To create a semaphore, I need to provide an initial count and maximum count. MSDN states that an initial count is - The initial number of requests for the semaphore that can be granted concurrently. While it states that maximum count is The maximum number of requests for the semaphore that can be granted concurrently. I can understand that the maximum count is the maximum number of threads that can access a resource concurrently. But, what is the use of initial count? If I create a semaphore with an initial count of 0 and a maximum count of 2, none of my threadpool threads are able to access the resource. If I set the initial count as 1 and maximum count as 2 then only thread pool thread can access the resource. It is only when I set both initial count and maximum count as 2, 2 threads are able to access the resource concurrently. So, I am really confused about the significance of initial count? SemaphoreSlim semaphoreSlim = new SemaphoreSlim(0, 2); //all threadpool threads wait SemaphoreSlim semaphoreSlim = new SemaphoreSlim(1, 2);//only one thread has access to the resource at a time SemaphoreSlim semaphoreSlim = new SemaphoreSlim(2, 2);//two threadpool threads can access the resource concurrently

    Read the article

  • Threading and cores

    - by Matt
    If I have X cores on my machine and I start X threads. Let's assume for the sake of argument that each thread is completely separated in terms of the memory, hdd, etc it uses. Is the OS going to know to send each thread to a core or do more time slicing on one core for multiple threads. What the question boils down to, is if I have X cores and my program must do independent calculations, should I start X threads, will they each get piped to a core, or is the presumption that because I have X cores I can start X threads completely wrong? I'm thinking it is. This is with C# --

    Read the article

  • Using NHibernate's HQL to make a query with multiple inner joins

    - by Abu Dhabi
    The problem here consists of translating a statement written in LINQ to SQL syntax into the equivalent for NHibernate. The LINQ to SQL code looks like so: var whatevervar = from threads in context.THREADs join threadposts in context.THREADPOSTs on threads.thread_id equals threadposts.thread_id join posts1 in context.POSTs on threadposts.post_id equals posts1.post_id join users in context.USERs on posts1.user_id equals users.user_id orderby posts1.post_time where threads.thread_id == int.Parse(id) select new { threads.thread_topic, posts1.post_time, users.user_display_name, users.user_signature, users.user_avatar, posts1.post_body, posts1.post_topic }; It's essentially trying to grab a list of posts within a given forum thread. The best I've been able to come up with (with the help of the helpful users of this site) for NHibernate is: var whatevervar = session.CreateQuery("select t.Thread_topic, p.Post_time, " + "u.User_display_name, u.User_signature, " + "u.User_avatar, p.Post_body, p.Post_topic " + "from THREADPOST tp " + "inner join tp.Thread_ as t " + "inner join tp.Post_ as p " + "inner join p.User_ as u " + "where tp.Thread_ = :what") .SetParameter<THREAD>("what", threadid) .SetResultTransformer(Transformers.AliasToBean(typeof(MyDTO))) .List<MyDTO>(); But that doesn't parse well, complaining that the aliases for the joined tables are null references. MyDTO is a custom type for the output: public class MyDTO { public string thread_topic { get; set; } public DateTime post_time { get; set; } public string user_display_name { get; set; } public string user_signature { get; set; } public string user_avatar { get; set; } public string post_topic { get; set; } public string post_body { get; set; } } I'm out of ideas, and while doing this by direct SQL query is possible, I'd like to do it properly, without defeating the purpose of using an ORM. Thanks in advance! EDIT: The database looks like this: http://i41.tinypic.com/5agciu.jpg (Can't post images yet.)

    Read the article

  • Locking individual elements in a static collection?

    - by user638474
    I have a static collection of objects that will be frequently updated from multiple threads. Is it possible to lock individual objects in a collection instead of locking the entire collection so that only threads trying to access the same object in the collection would get blocked instead of every thread? If there is a better way to update objects in a collection from multiple threads, I'm all ears.

    Read the article

  • How to synchronize access to many objects

    - by vividos
    I have a thread pool with some threads (e.g. as many as number of cores) that work on many objects, say thousands of objects. Normally I would give each object a mutex to protect access to its internals, lock it when I'm doing work, then release it. When two threads would try to access the same object, one of the threads has to wait. Now I want to save some resources and be scalable, as there may be thousands of objects, and still only a hand full of threads. I'm thinking about a class design where the thread has some sort of mutex or lock object, and assigns the lock to the object when the object should be accessed. This would save resources, as I only have as much lock objects as I have threads. Now comes the programming part, where I want to transfer this design into code, but don't know quite where to start. I'm programming in C++ and want to use Boost classes where possible, but self written classes that handle these special requirements are ok. How would I implement this? My first idea was to have a boost::mutex object per thread, and each object has a boost::shared_ptr that initially is unset (or NULL). Now when I want to access the object, I lock it by creating a scoped_lock object and assign it to the shared_ptr. When the shared_ptr is already set, I wait on the present lock. This idea sounds like a heap full of race conditions, so I sort of abandoned it. Is there another way to accomplish this design? A completely different way?

    Read the article

  • C# Basic Multi-Threading Question: Call Method on Thread A from Thread B (Thread B started from Thre

    - by Nick
    What is the best way to accomplish this: The main thread (Thread A) creates two other threads (Thread B and Thread C). Threads B and C do heavy disk I/O and eventually need to pass in resources they created to Thread A to then call a method in an external DLL file which requires the thread that created it to be called correctly so only Thread A can call it. The only other time I ever used threads was in a Windows Forms application, and the invoke methods were just what I needed. This program does not use Windows Forms, and as such there are no Control.Invoke methods to use. I have noticed in my testing that if a variable is created in Thread A, I have no trouble accessing and modifying it from Thread B/C which seems very wrong to me. With Winforms, I was sure it threw errors for trying to access things created on other threads. I know it is unsafe to change things from multiple threads, but I really hoped .NET would forbid it altogether to ensure safe coding. Does .NET do this, and I am just missing the boat, or does it only do it with WinForm apps? Since it does seemingly allow this, do I do something like an OS would do, create a flag and monitor it from Thread A to see if it changes. If it does, then call the method. Doesnt the event handler essentially do this, so could an event be used somehow called on the main thread?

    Read the article

  • Parallelism in .NET – Part 9, Configuration in PLINQ and TPL

    - by Reed
    Parallel LINQ and the Task Parallel Library contain many options for configuration.  Although the default configuration options are often ideal, there are times when customizing the behavior is desirable.  Both frameworks provide full configuration support. When working with Data Parallelism, there is one primary configuration option we often need to control – the number of threads we want the system to use when parallelizing our routine.  By default, PLINQ and the TPL both use the ThreadPool to schedule tasks.  Given the major improvements in the ThreadPool in CLR 4, this default behavior is often ideal.  However, there are times that the default behavior is not appropriate.  For example, if you are working on multiple threads simultaneously, and want to schedule parallel operations from within both threads, you might want to consider restricting each parallel operation to using a subset of the processing cores of the system.  Not doing this might over-parallelize your routine, which leads to inefficiencies from having too many context switches. In the Task Parallel Library, configuration is handled via the ParallelOptions class.  All of the methods of the Parallel class have an overload which accepts a ParallelOptions argument. We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property.  For example, let’s revisit one of the simple data parallel examples from Part 2: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re looping through an image, and calling a method on each pixel in the image.  If this was being done on a separate thread, and we knew another thread within our system was going to be doing a similar operation, we likely would want to restrict this to using half of the cores on the system.  This could be accomplished easily by doing: var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Math.Max(Environment.ProcessorCount / 2, 1); Parallel.For(0, pixelData.GetUpperBound(0), options, row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Now, we’re restricting this routine to using no more than half the cores in our system.  Note that I included a check to prevent a single core system from supplying zero; without this check, we’d potentially cause an exception.  I also did not hard code a specific value for the MaxDegreeOfParallelism property.  One of our goals when parallelizing a routine is allowing it to scale on better hardware.  Specifying a hard-coded value would contradict that goal. Parallel LINQ also supports configuration, and in fact, has quite a few more options for configuring the system.  The main configuration option we most often need is the same as our TPL option: we need to supply the maximum number of processing threads.  In PLINQ, this is done via a new extension method on ParallelQuery<T>: ParallelEnumerable.WithDegreeOfParallelism. Let’s revisit our declarative data parallelism sample from Part 6: double min = collection.AsParallel().Min(item => item.PerformComputation()); Here, we’re performing a computation on each element in the collection, and saving the minimum value of this operation.  If we wanted to restrict this to a limited number of threads, we would add our new extension method: int maxThreads = Math.Max(Environment.ProcessorCount / 2, 1); double min = collection .AsParallel() .WithDegreeOfParallelism(maxThreads) .Min(item => item.PerformComputation()); This automatically restricts the PLINQ query to half of the threads on the system. PLINQ provides some additional configuration options.  By default, PLINQ will occasionally revert to processing a query in parallel.  This occurs because many queries, if parallelized, typically actually cause an overall slowdown compared to a serial processing equivalent.  By analyzing the “shape” of the query, PLINQ often decides to run a query serially instead of in parallel.  This can occur for (taken from MSDN): Queries that contain a Select, indexed Where, indexed SelectMany, or ElementAt clause after an ordering or filtering operator that has removed or rearranged original indices. Queries that contain a Take, TakeWhile, Skip, SkipWhile operator and where indices in the source sequence are not in the original order. Queries that contain Zip or SequenceEquals, unless one of the data sources has an originally ordered index and the other data source is indexable (i.e. an array or IList(T)). Queries that contain Concat, unless it is applied to indexable data sources. Queries that contain Reverse, unless applied to an indexable data source. If the specific query follows these rules, PLINQ will run the query on a single thread.  However, none of these rules look at the specific work being done in the delegates, only at the “shape” of the query.  There are cases where running in parallel may still be beneficial, even if the shape is one where it typically parallelizes poorly.  In these cases, you can override the default behavior by using the WithExecutionMode extension method.  This would be done like so: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .Select(i => i.PerformComputation()) .Reverse(); Here, the default behavior would be to not parallelize the query unless collection implemented IList<T>.  We can force this to run in parallel by adding the WithExecutionMode extension method in the method chain. Finally, PLINQ has the ability to configure how results are returned.  When a query is filtering or selecting an input collection, the results will need to be streamed back into a single IEnumerable<T> result.  For example, the method above returns a new, reversed collection.  In this case, the processing of the collection will be done in parallel, but the results need to be streamed back to the caller serially, so they can be enumerated on a single thread. This streaming introduces overhead.  IEnumerable<T> isn’t designed with thread safety in mind, so the system needs to handle merging the parallel processes back into a single stream, which introduces synchronization issues.  There are two extremes of how this could be accomplished, but both extremes have disadvantages. The system could watch each thread, and whenever a thread produces a result, take that result and send it back to the caller.  This would mean that the calling thread would have access to the data as soon as data is available, which is the benefit of this approach.  However, it also means that every item is introducing synchronization overhead, since each item needs to be merged individually. On the other extreme, the system could wait until all of the results from all of the threads were ready, then push all of the results back to the calling thread in one shot.  The advantage here is that the least amount of synchronization is added to the system, which means the query will, on a whole, run the fastest.  However, the calling thread will have to wait for all elements to be processed, so this could introduce a long delay between when a parallel query begins and when results are returned. The default behavior in PLINQ is actually between these two extremes.  By default, PLINQ maintains an internal buffer, and chooses an optimal buffer size to maintain.  Query results are accumulated into the buffer, then returned in the IEnumerable<T> result in chunks.  This provides reasonably fast access to the results, as well as good overall throughput, in most scenarios. However, if we know the nature of our algorithm, we may decide we would prefer one of the other extremes.  This can be done by using the WithMergeOptions extension method.  For example, if we know that our PerformComputation() routine is very slow, but also variable in runtime, we may want to retrieve results as they are available, with no bufferring.  This can be done by changing our above routine to: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .Select(i => i.PerformComputation()) .Reverse(); On the other hand, if are already on a background thread, and we want to allow the system to maximize its speed, we might want to allow the system to fully buffer the results: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.FullyBuffered) .Select(i => i.PerformComputation()) .Reverse(); Notice, also, that you can specify multiple configuration options in a parallel query.  By chaining these extension methods together, we generate a query that will always run in parallel, and will always complete before making the results available in our IEnumerable<T>.

    Read the article

  • Parallelism in .NET – Part 4, Imperative Data Parallelism: Aggregation

    - by Reed
    In the article on simple data parallelism, I described how to perform an operation on an entire collection of elements in parallel.  Often, this is not adequate, as the parallel operation is going to be performing some form of aggregation. Simple examples of this might include taking the sum of the results of processing a function on each element in the collection, or finding the minimum of the collection given some criteria.  This can be done using the techniques described in simple data parallelism, however, special care needs to be taken into account to synchronize the shared data appropriately.  The Task Parallel Library has tools to assist in this synchronization. The main issue with aggregation when parallelizing a routine is that you need to handle synchronization of data.  Since multiple threads will need to write to a shared portion of data.  Suppose, for example, that we wanted to parallelize a simple loop that looked for the minimum value within a dataset: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This seems like a good candidate for parallelization, but there is a problem here.  If we just wrap this into a call to Parallel.ForEach, we’ll introduce a critical race condition, and get the wrong answer.  Let’s look at what happens here: // Buggy code! Do not use! double min = double.MaxValue; Parallel.ForEach(collection, item => { double value = item.PerformComputation(); min = System.Math.Min(min, value); }); This code has a fatal flaw: min will be checked, then set, by multiple threads simultaneously.  Two threads may perform the check at the same time, and set the wrong value for min.  Say we get a value of 1 in thread 1, and a value of 2 in thread 2, and these two elements are the first two to run.  If both hit the min check line at the same time, both will determine that min should change, to 1 and 2 respectively.  If element 1 happens to set the variable first, then element 2 sets the min variable, we’ll detect a min value of 2 instead of 1.  This can lead to wrong answers. Unfortunately, fixing this, with the Parallel.ForEach call we’re using, would require adding locking.  We would need to rewrite this like: // Safe, but slow double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach(collection, item => { double value = item.PerformComputation(); lock(syncObject) min = System.Math.Min(min, value); }); This will potentially add a huge amount of overhead to our calculation.  Since we can potentially block while waiting on the lock for every single iteration, we will most likely slow this down to where it is actually quite a bit slower than our serial implementation.  The problem is the lock statement – any time you use lock(object), you’re almost assuring reduced performance in a parallel situation.  This leads to two observations I’ll make: When parallelizing a routine, try to avoid locks. That being said: Always add any and all required synchronization to avoid race conditions. These two observations tend to be opposing forces – we often need to synchronize our algorithms, but we also want to avoid the synchronization when possible.  Looking at our routine, there is no way to directly avoid this lock, since each element is potentially being run on a separate thread, and this lock is necessary in order for our routine to function correctly every time. However, this isn’t the only way to design this routine to implement this algorithm.  Realize that, although our collection may have thousands or even millions of elements, we have a limited number of Processing Elements (PE).  Processing Element is the standard term for a hardware element which can process and execute instructions.  This typically is a core in your processor, but many modern systems have multiple hardware execution threads per core.  The Task Parallel Library will not execute the work for each item in the collection as a separate work item. Instead, when Parallel.ForEach executes, it will partition the collection into larger “chunks” which get processed on different threads via the ThreadPool.  This helps reduce the threading overhead, and help the overall speed.  In general, the Parallel class will only use one thread per PE in the system. Given the fact that there are typically fewer threads than work items, we can rethink our algorithm design.  We can parallelize our algorithm more effectively by approaching it differently.  Because the basic aggregation we are doing here (Min) is communitive, we do not need to perform this in a given order.  We knew this to be true already – otherwise, we wouldn’t have been able to parallelize this routine in the first place.  With this in mind, we can treat each thread’s work independently, allowing each thread to serially process many elements with no locking, then, after all the threads are complete, “merge” together the results. This can be accomplished via a different set of overloads in the Parallel class: Parallel.ForEach<TSource,TLocal>.  The idea behind these overloads is to allow each thread to begin by initializing some local state (TLocal).  The thread will then process an entire set of items in the source collection, providing that state to the delegate which processes an individual item.  Finally, at the end, a separate delegate is run which allows you to handle merging that local state into your final results. To rewriting our routine using Parallel.ForEach<TSource,TLocal>, we need to provide three delegates instead of one.  The most basic version of this function is declared as: public static ParallelLoopResult ForEach<TSource, TLocal>( IEnumerable<TSource> source, Func<TLocal> localInit, Func<TSource, ParallelLoopState, TLocal, TLocal> body, Action<TLocal> localFinally ) The first delegate (the localInit argument) is defined as Func<TLocal>.  This delegate initializes our local state.  It should return some object we can use to track the results of a single thread’s operations. The second delegate (the body argument) is where our main processing occurs, although now, instead of being an Action<T>, we actually provide a Func<TSource, ParallelLoopState, TLocal, TLocal> delegate.  This delegate will receive three arguments: our original element from the collection (TSource), a ParallelLoopState which we can use for early termination, and the instance of our local state we created (TLocal).  It should do whatever processing you wish to occur per element, then return the value of the local state after processing is completed. The third delegate (the localFinally argument) is defined as Action<TLocal>.  This delegate is passed our local state after it’s been processed by all of the elements this thread will handle.  This is where you can merge your final results together.  This may require synchronization, but now, instead of synchronizing once per element (potentially millions of times), you’ll only have to synchronize once per thread, which is an ideal situation. Now that I’ve explained how this works, lets look at the code: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Although this is a bit more complicated than the previous version, it is now both thread-safe, and has minimal locking.  This same approach can be used by Parallel.For, although now, it’s Parallel.For<TLocal>.  When working with Parallel.For<TLocal>, you use the same triplet of delegates, with the same purpose and results. Also, many times, you can completely avoid locking by using a method of the Interlocked class to perform the final aggregation in an atomic operation.  The MSDN example demonstrating this same technique using Parallel.For uses the Interlocked class instead of a lock, since they are doing a sum operation on a long variable, which is possible via Interlocked.Add. By taking advantage of local state, we can use the Parallel class methods to parallelize algorithms such as aggregation, which, at first, may seem like poor candidates for parallelization.  Doing so requires careful consideration, and often requires a slight redesign of the algorithm, but the performance gains can be significant if handled in a way to avoid excessive synchronization.

    Read the article

  • dns queries not using nscd for caching

    - by xenoterracide
    I'm trying to use nscd (Nameservices Cache Daemon) to cache dns locally so I can stop using bind to do it. I've gotten it started and ntpd seems to attempt to use it. But everything else for hosts seems to ignore it. e.g if I do dig apache.org 3 times none of them will hit the cache. I'm viewing the cache stats using nscd -g to determine whether it's been used. I've also turned the debug log level up to see if I can see it hitting and the queries don't even hit nscd. nsswitch.conf # Begin /etc/nsswitch.conf passwd: files group: files shadow: files publickey: files hosts: cache files dns networks: files protocols: files services: files ethers: files rpc: files netgroup: files # End /etc/nsswitch.confenter code here nscd.conf # # /etc/nscd.conf # # An example Name Service Cache config file. This file is needed by nscd. # # Legal entries are: # # logfile <file> # debug-level <level> # threads <initial #threads to use> # max-threads <maximum #threads to use> # server-user <user to run server as instead of root> # server-user is ignored if nscd is started with -S parameters # stat-user <user who is allowed to request statistics> # reload-count unlimited|<number> # paranoia <yes|no> # restart-interval <time in seconds> # # enable-cache <service> <yes|no> # positive-time-to-live <service> <time in seconds> # negative-time-to-live <service> <time in seconds> # suggested-size <service> <prime number> # check-files <service> <yes|no> # persistent <service> <yes|no> # shared <service> <yes|no> # max-db-size <service> <number bytes> # auto-propagate <service> <yes|no> # # Currently supported cache names (services): passwd, group, hosts, services # logfile /var/log/nscd.log threads 4 max-threads 32 server-user nobody # stat-user somebody debug-level 9 # reload-count 5 paranoia no # restart-interval 3600 enable-cache passwd yes positive-time-to-live passwd 600 negative-time-to-live passwd 20 suggested-size passwd 211 check-files passwd yes persistent passwd yes shared passwd yes max-db-size passwd 33554432 auto-propagate passwd yes enable-cache group yes positive-time-to-live group 3600 negative-time-to-live group 60 suggested-size group 211 check-files group yes persistent group yes shared group yes max-db-size group 33554432 auto-propagate group yes enable-cache hosts yes positive-time-to-live hosts 3600 negative-time-to-live hosts 20 suggested-size hosts 211 check-files hosts yes persistent hosts yes shared hosts yes max-db-size hosts 33554432 enable-cache services yes positive-time-to-live services 28800 negative-time-to-live services 20 suggested-size services 211 check-files services yes persistent services yes shared services yes max-db-size services 33554432 resolv.conf # Generated by dhcpcd from eth0 nameserver 127.0.0.1 domain westell.com nameserver 192.168.1.1 nameserver 208.67.222.222 nameserver 208.67.220.220 as kind of a side note I'm using archlinux.

    Read the article

  • dns queries not using nscd for caching

    - by xenoterracide
    I'm trying to use nscd (Nameservices Cache Daemon) to cache dns locally so I can stop using bind to do it. I've gotten it started and ntpd seems to attempt to use it. But everything else for hosts seems to ignore it. e.g if I do dig apache.org 3 times none of them will hit the cache. I'm viewing the cache stats using nscd -g to determine whether it's been used. I've also turned the debug log level up to see if I can see it hitting and the queries don't even hit nscd. nsswitch.conf # Begin /etc/nsswitch.conf passwd: files group: files shadow: files publickey: files hosts: cache files dns networks: files protocols: files services: files ethers: files rpc: files netgroup: files # End /etc/nsswitch.confenter code here nscd.conf # # /etc/nscd.conf # # An example Name Service Cache config file. This file is needed by nscd. # # Legal entries are: # # logfile <file> # debug-level <level> # threads <initial #threads to use> # max-threads <maximum #threads to use> # server-user <user to run server as instead of root> # server-user is ignored if nscd is started with -S parameters # stat-user <user who is allowed to request statistics> # reload-count unlimited|<number> # paranoia <yes|no> # restart-interval <time in seconds> # # enable-cache <service> <yes|no> # positive-time-to-live <service> <time in seconds> # negative-time-to-live <service> <time in seconds> # suggested-size <service> <prime number> # check-files <service> <yes|no> # persistent <service> <yes|no> # shared <service> <yes|no> # max-db-size <service> <number bytes> # auto-propagate <service> <yes|no> # # Currently supported cache names (services): passwd, group, hosts, services # logfile /var/log/nscd.log threads 4 max-threads 32 server-user nobody # stat-user somebody debug-level 9 # reload-count 5 paranoia no # restart-interval 3600 enable-cache passwd yes positive-time-to-live passwd 600 negative-time-to-live passwd 20 suggested-size passwd 211 check-files passwd yes persistent passwd yes shared passwd yes max-db-size passwd 33554432 auto-propagate passwd yes enable-cache group yes positive-time-to-live group 3600 negative-time-to-live group 60 suggested-size group 211 check-files group yes persistent group yes shared group yes max-db-size group 33554432 auto-propagate group yes enable-cache hosts yes positive-time-to-live hosts 3600 negative-time-to-live hosts 20 suggested-size hosts 211 check-files hosts yes persistent hosts yes shared hosts yes max-db-size hosts 33554432 enable-cache services yes positive-time-to-live services 28800 negative-time-to-live services 20 suggested-size services 211 check-files services yes persistent services yes shared services yes max-db-size services 33554432 resolv.conf # Generated by dhcpcd from eth0 nameserver 127.0.0.1 domain westell.com nameserver 192.168.1.1 nameserver 208.67.222.222 nameserver 208.67.220.220 as kind of a side note I'm using archlinux.

    Read the article

  • ASP.NET Asynchronous Pages and when to use them

    - by rajbk
    There have been several articles posted about using  asynchronous pages in ASP.NET but none of them go into detail as to when you should use them. I finally found a great post by Thomas Marquardt that explains the process in depth. He addresses a key misconception also: So, in your ASP.NET application, when should you perform work asynchronously instead of synchronously? Well, only 1 thread per CPU can execute at a time.  Did you catch that?  A lot of people seem to miss this point...only one thread executes at a time on a CPU. When you have more than this, you pay an expensive penalty--a context switch. However, if a thread is blocked waiting on work...then it makes sense to switch to another thread, one that can execute now.  It also makes sense to switch threads if you want work to be done in parallel as opposed to in series, but up until a certain point it actually makes much more sense to execute work in series, again, because of the expensive context switch. Pop quiz: If you have a thread that is doing a lot of computational work and using the CPU heavily, and this takes a while, should you switch to another thread? No! The current thread is efficiently using the CPU, so switching will only incur the cost of a context switch. Ok, well, what if you have a thread that makes an HTTP or SOAP request to another server and takes a long time, should you switch threads? Yes! You can perform the HTTP or SOAP request asynchronously, so that once the "send" has occurred, you can unwind the current thread and not use any threads until there is an I/O completion for the "receive". Between the "send" and the "receive", the remote server is busy, so locally you don't need to be blocking on a thread, but instead make use of the asynchronous APIs provided in .NET Framework so that you can unwind and be notified upon completion. Again, it only makes sense to switch threads if the benefit from doing so out weights the cost of the switch. Read more about it in these posts: Performing Asynchronous Work, or Tasks, in ASP.NET Applications http://blogs.msdn.com/tmarq/archive/2010/04/14/performing-asynchronous-work-or-tasks-in-asp-net-applications.aspx ASP.NET Thread Usage on IIS 7.0 and 6.0 http://blogs.msdn.com/tmarq/archive/2007/07/21/asp-net-thread-usage-on-iis-7-0-and-6-0.aspx   PS: I generally do not write posts that simply link to other posts but think it is warranted in this case.

    Read the article

  • Inverted schedctl usage in the JVM

    - by Dave
    The schedctl facility in Solaris allows a thread to request that the kernel defer involuntary preemption for a brief period. The mechanism is strictly advisory - the kernel can opt to ignore the request. Schedctl is typically used to bracket lock critical sections. That, in turn, can avoid convoying -- threads piling up on a critical section behind a preempted lock-holder -- and other lock-related performance pathologies. If you're interested see the man pages for schedctl_start() and schedctl_stop() and the schedctl.h include file. The implementation is very efficient. schedctl_start(), which asks that preemption be deferred, simply stores into a thread-specific structure -- the schedctl block -- that the kernel maps into user-space. Similarly, schedctl_stop() clears the flag set by schedctl_stop() and then checks a "preemption pending" flag in the block. Normally, this will be false, but if set schedctl_stop() will yield to politely grant the CPU to other threads. Note that you can't abuse this facility for long-term preemption avoidance as the deferral is brief. If your thread exceeds the grace period the kernel will preempt it and transiently degrade its effective scheduling priority. Further reading : US05937187 and various papers by Andy Tucker. We'll now switch topics to the implementation of the "synchronized" locking construct in the HotSpot JVM. If a lock is contended then on multiprocessor systems we'll spin briefly to try to avoid context switching. Context switching is wasted work and inflicts various cache and TLB penalties on the threads involved. If context switching were "free" then we'd never spin to avoid switching, but that's not the case. We use an adaptive spin-then-park strategy. One potentially undesirable outcome is that we can be preempted while spinning. When our spinning thread is finally rescheduled the lock may or may not be available. If not, we'll spin and then potentially park (block) again, thus suffering a 2nd context switch. Recall that the reason we spin is to avoid context switching. To avoid this scenario I've found it useful to enable schedctl to request deferral while spinning. But while spinning I've arranged for the code to periodically check or poll the "preemption pending" flag. If that's found set we simply abandon our spinning attempt and park immediately. This avoids the double context-switch scenario above. One annoyance is that the schedctl blocks for the threads in a given process are tightly packed on special pages mapped from kernel space into user-land. As such, writes to the schedctl blocks can cause false sharing on other adjacent blocks. Hopefully the kernel folks will make changes to avoid this by padding and aligning the blocks to ensure that one cache line underlies at most one schedctl block at any one time.

    Read the article

< Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45  | Next Page >