Search Results

Search found 3641 results on 146 pages for 'threads'.

Page 43/146 | < Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50  | Next Page >

  • What's up with LDoms: Part 1 - Introduction & Basic Concepts

    - by Stefan Hinker
    LDoms - the correct name is Oracle VM Server for SPARC - have been around for quite a while now.  But to my surprise, I get more and more requests to explain how they work or to give advise on how to make good use of them.  This made me think that writing up a few articles discussing the different features would be a good idea.  Now - I don't intend to rewrite the LDoms Admin Guide or to copy and reformat the (hopefully) well known "Beginners Guide to LDoms" by Tony Shoumack from 2007.  Those documents are very recommendable - especially the Beginners Guide, although based on LDoms 1.0, is still a good place to begin with.  However, LDoms have come a long way since then, and I hope to contribute to their adoption by discussing how they work and what features there are today.  In this and the following posts, I will use the term "LDoms" as a common abbreviation for Oracle VM Server for SPARC, just because it's a lot shorter and easier to type (and presumably, read). So, just to get everyone on the same baseline, lets briefly discuss the basic concepts of virtualization with LDoms.  LDoms make use of a hypervisor as a layer of abstraction between real, physical hardware and virtual hardware.  This virtual hardware is then used to create a number of guest systems which each behave very similar to a system running on bare metal:  Each has its own OBP, each will install its own copy of the Solaris OS and each will see a certain amount of CPU, memory, disk and network resources available to it.  Unlike some other type 1 hypervisors running on x86 hardware, the SPARC hypervisor is embedded in the system firmware and makes use both of supporting functions in the sun4v SPARC instruction set as well as the overall CPU architecture to fulfill its function. The CMT architecture of the supporting CPUs (T1 through T4) provide a large number of cores and threads to the OS.  For example, the current T4 CPU has eight cores, each running 8 threads, for a total of 64 threads per socket.  To the OS, this looks like 64 CPUs.  The SPARC hypervisor, when creating guest systems, simply assigns a certain number of these threads exclusively to one guest, thus avoiding the overhead of having to schedule OS threads to CPUs, as do typical x86 hypervisors.  The hypervisor only assigns CPUs and then steps aside.  It is not involved in the actual work being dispatched from the OS to the CPU, all it does is maintain isolation between different guests. Likewise, memory is assigned exclusively to individual guests.  Here,  the hypervisor provides generic mappings between the physical hardware addresses and the guest's views on memory.  Again, the hypervisor is not involved in the actual memory access, it only maintains isolation between guests. During the inital setup of a system with LDoms, you start with one special domain, called the Control Domain.  Initially, this domain owns all the hardware available in the system, including all CPUs, all RAM and all IO resources.  If you'd be running the system un-virtualized, this would be what you'd be working with.  To allow for guests, you first resize this initial domain (also called a primary domain in LDoms speak), assigning it a small amount of CPU and memory.  This frees up most of the available CPU and memory resources for guest domains.  IO is a little more complex, but very straightforward.  When LDoms 1.0 first came out, the only way to provide IO to guest systems was to create virtual disk and network services and attach guests to these services.  In the meantime, several different ways to connect guest domains to IO have been developed, the most recent one being SR-IOV support for network devices released in version 2.2 of Oracle VM Server for SPARC. I will cover these more advanced features in detail later.  For now, lets have a short look at the initial way IO was virtualized in LDoms: For virtualized IO, you create two services, one "Virtual Disk Service" or vds, and one "Virtual Switch" or vswitch.  You can, of course, also create more of these, but that's more advanced than I want to cover in this introduction.  These IO services now connect real, physical IO resources like a disk LUN or a networt port to the virtual devices that are assigned to guest domains.  For disk IO, the normal case would be to connect a physical LUN (or some other storage option that I'll discuss later) to one specific guest.  That guest would be assigned a virtual disk, which would appear to be just like a real LUN to the guest, while the IO is actually routed through the virtual disk service down to the physical device.  For network, the vswitch acts very much like a real, physical ethernet switch - you connect one physical port to it for outside connectivity and define one or more connections per guest, just like you would plug cables between a real switch and a real system. For completeness, there is another service that provides console access to guest domains which mimics the behavior of serial terminal servers. The connections between the virtual devices on the guest's side and the virtual IO services in the primary domain are created by the hypervisor.  It uses so called "Logical Domain Channels" or LDCs to create point-to-point connections between all of these devices and services.  These LDCs work very similar to high speed serial connections and are configured automatically whenever the Control Domain adds or removes virtual IO. To see all this in action, now lets look at a first example.  I will start with a newly installed machine and configure the control domain so that it's ready to create guest systems. In a first step, after we've installed the software, let's start the virtual console service and downsize the primary domain.  root@sun # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-c-- UART 512 261632M 0.3% 2d 13h 58m root@sun # ldm add-vconscon port-range=5000-5100 \ primary-console primary root@sun # svcadm enable vntsd root@sun # svcs vntsd STATE STIME FMRI online 9:53:21 svc:/ldoms/vntsd:default root@sun # ldm set-vcpu 16 primary root@sun # ldm set-mau 1 primary root@sun # ldm start-reconf primary root@sun # ldm set-memory 7680m primary root@sun # ldm add-config initial root@sun # shutdown -y -g0 -i6 So what have I done: I've defined a range of ports (5000-5100) for the virtual network terminal service and then started that service.  The vnts will later provide console connections to guest systems, very much like serial NTS's do in the physical world. Next, I assigned 16 vCPUs (on this platform, a T3-4, that's two cores) to the primary domain, freeing the rest up for future guest systems.  I also assigned one MAU to this domain.  A MAU is a crypto unit in the T3 CPU.  These need to be explicitly assigned to domains, just like CPU or memory.  (This is no longer the case with T4 systems, where crypto is always available everywhere.) Before I reassigned the memory, I started what's called a "delayed reconfiguration" session.  That avoids actually doing the change right away, which would take a considerable amount of time in this case.  Instead, I'll need to reboot once I'm all done.  I've assigned 7680MB of RAM to the primary.  That's 8GB less the 512MB which the hypervisor uses for it's own private purposes.  You can, depending on your needs, work with less.  I'll spend a dedicated article on sizing, discussing the pros and cons in detail. Finally, just before the reboot, I saved my work on the ILOM, to make this configuration available after a powercycle of the box.  (It'll always be available after a simple reboot, but the ILOM needs to know the configuration of the hypervisor after a power-cycle, before the primary domain is booted.) Now, lets create a first disk service and a first virtual switch which is connected to the physical network device igb2. We will later use these to connect virtual disks and virtual network ports of our guest systems to real world storage and network. root@sun # ldm add-vds primary-vds root@sun # ldm add-vswitch net-dev=igb2 switch-primary primary You are free to choose whatever names you like for the virtual disk service and the virtual switch.  I strongly recommend that you choose names that make sense to you and describe the function of each service in the context of your implementation.  For the vswitch, for example, you could choose names like "admin-vswitch" or "production-network" etc. This already concludes the configuration of the control domain.  We've freed up considerable amounts of CPU and RAM for guest systems and created the necessary infrastructure - console, vts and vswitch - so that guests systems can actually interact with the outside world.  The system is now ready to create guests, which I'll describe in the next section. For further reading, here are some recommendable links: The LDoms 2.2 Admin Guide The "Beginners Guide to LDoms" The LDoms Information Center on MOS LDoms on OTN

    Read the article

  • Haskell vs Erlang for web services

    - by Zachary K
    I am looking to start an experimental project using a functional language and am trying to decide beween Erlang and Haskell, and both have some points that I really like. I like Haskell's strong type system and purity. I have a feeling it will make it easier to write really reliable code. And I think that the power of haskell will make some of what I want to do much easier. On the minus side I get the feeling that some of the Frameworks for doing web stuff on Haskell such as Yesod are not as advanced as their Erlang counter parts. I rather like the Erlang approach to threads and to fault tollerence. I have a feeling that the scalability of Erlang could be a major plus. Which leeds to to my question, what has people's exerience been in implementing web application backends in both Haskell and Erlang. Are there packages for Haskell to provide some of the lightweight threads and actors that one has in Erlang?

    Read the article

  • IBM "per core" comparisons for SPECjEnterprise2010

    - by jhenning
    I recently stumbled upon a blog entry from Roman Kharkovski (an IBM employee) comparing some SPECjEnterprise2010 results for IBM vs. Oracle. Mr. Kharkovski's blog claims that SPARC delivers half the transactions per core vs. POWER7. Prior to any argument, I should say that my predisposition is to like Mr. Kharkovski, because he says that his blog is intended to be factual; that the intent is to try to avoid marketing hype and FUD tactic; and mostly because he features a picture of himself wearing a bike helmet (me too). Therefore, in a spirit of technical argument, rather than FUD fight, there are a few areas in his comparison that should be discussed. Scaling is not free For any benchmark, if a small system scores 13k using quantity R1 of some resource, and a big system scores 57k using quantity R2 of that resource, then, sure, it's tempting to divide: is  13k/R1 > 57k/R2 ? It is tempting, but not necessarily educational. The problem is that scaling is not free. Building big systems is harder than building small systems. Scoring  13k/R1  on a little system provides no guarantee whatsoever that one can sustain that ratio when attempting to handle more than 4 times as many users. Choosing the denominator radically changes the picture When ratios are used, one can vastly manipulate appearances by the choice of denominator. In this case, lots of choices are available for the resource to be compared (R1 and R2 above). IBM chooses to put cores in the denominator. Mr. Kharkovski provides some reasons for that choice in his blog entry. And yet, it should be noted that the very concept of a core is: arbitrary: not necessarily comparable across vendors; fluid: modern chips shift chip resources in response to load; and invisible: unless you have a microscope, you can't see it. By contrast, one can actually see processor chips with the naked eye, and they are a bit easier to count. If we put chips in the denominator instead of cores, we get: 13161.07 EjOPS / 4 chips = 3290 EjOPS per chip for IBM vs 57422.17 EjOPS / 16 chips = 3588 EjOPS per chip for Oracle The choice of denominator makes all the difference in the appearance. Speaking for myself, dividing by chips just seems to make more sense, because: I can see chips and count them; and I can accurately compare the number of chips in my system to the count in some other vendor's system; and Tthe probability of being able to continue to accurately count them over the next 10 years of microprocessor development seems higher than the probability of being able to accurately and comparably count "cores". SPEC Fair use requirements Speaking as an individual, not speaking for SPEC and not speaking for my employer, I wonder whether Mr. Kharkovski's blog article, taken as a whole, meets the requirements of the SPEC Fair Use rule www.spec.org/fairuse.html section I.D.2. For example, Mr. Kharkovski's footnote (1) begins Results from http://www.spec.org as of 04/04/2013 Oracle SUN SPARC T5-8 449 EjOPS/core SPECjEnterprise2010 (Oracle's WLS best SPECjEnterprise2010 EjOPS/core result on SPARC). IBM Power730 823 EjOPS/core (World Record SPECjEnterprise2010 EJOPS/core result) The questionable tactic, from a Fair Use point of view, is that there is no such metric at the designated location. At www.spec.org, You can find the SPEC metric 57422.17 SPECjEnterprise2010 EjOPS for Oracle and You can also find the SPEC metric 13161.07 SPECjEnterprise2010 EjOPS for IBM. Despite the implication of the footnote, you will not find any mention of 449 nor anything that says 823. SPEC says that you can, under its fair use rule, derive your own values; but it emphasizes: "The context must not give the appearance that SPEC has created or endorsed the derived value." Substantiation and transparency Although SPEC disclaims responsibility for non-SPEC information (section I.E), it says that non-SPEC data and methods should be accurate, should be explained, should be substantiated. Unfortunately, it is difficult or impossible for the reader to independently verify the pricing: Were like units compared to like (e.g. list price to list price)? Were all components (hw, sw, support) included? Were all fees included? Note that when tpc.org shows IBM pricing, there are often items such as "PROCESSOR ACTIVATION" and "MEMORY ACTIVATION". Without the transparency of a detailed breakdown, the pricing claims are questionable. T5 claim for "Fastest Processor" Mr. Kharkovski several times questions Oracle's claim for fastest processor, writing You see, when you publish industry benchmarks, people may actually compare your results to other vendor's results. Well, as we performance people always say, "it depends". If you believe in performance-per-core as the primary way of looking at the world, then yes, the POWER7+ is impressive, spending its chip resources to support up to 32 threads (8 cores x 4 threads). Or, it just might be useful to consider performance-per-chip. Each SPARC T5 chip allows 128 hardware threads to be simultaneously executing (16 cores x 8 threads). The Industry Standard Benchmark that focuses specifically on processor chip performance is SPEC CPU2006. For this very well known and popular benchmark, SPARC T5: provides better performance than both POWER7 and POWER7+, for 1 chip vs. 1 chip, for 8 chip vs. 8 chip, for integer (SPECint_rate2006) and floating point (SPECfp_rate2006), for Peak tuning and for Base tuning. For example, at the 8-chip level, integer throughput (SPECint_rate2006) is: 3750 for SPARC 2170 for POWER7+. You can find the details at the March 2013 BestPerf CPU2006 page SPEC is a trademark of the Standard Performance Evaluation Corporation, www.spec.org. The two specific results quoted for SPECjEnterprise2010 are posted at the URLs linked from the discussion. Results for SPEC CPU2006 were verified at spec.org 1 July 2013, and can be rechecked here.

    Read the article

  • Why lock statements don't scale

    - by Alex.Davies
    We are going to have to stop using lock statements one day. Just like we had to stop using goto statements. The problem is similar, they're pretty easy to follow in small programs, but code with locks isn't composable. That means that small pieces of program that work in isolation can't necessarily be put together and work together. Of course actors scale fine :) Why lock statements don't scale as software gets bigger Deadlocks. You have a program with lots of threads picking up lots of locks. You already know that if two of your threads both try to pick up a lock that the other already has, they will deadlock. Your program will come to a grinding halt, and there will be fire and brimstone. "Easy!" you say, "Just make sure all the threads pick up the locks in the same order." Yes, that works. But you've broken composability. Now, to add a new lock to your code, you have to consider all the other locks already in your code and check that they are taken in the right order. Algorithm buffs will have noticed this approach means it takes quadratic time to write a program. That's bad. Why lock statements don't scale as hardware gets bigger Memory bus contention There's another headache, one that most programmers don't usually need to think about, but is going to bite us in a big way in a few years. Locking needs exclusive use of the entire system's memory bus while taking out the lock. That's not too bad for a single or dual-core system, but already for quad-core systems it's a pretty large overhead. Have a look at this blog about the .NET 4 ThreadPool for some numbers and a weird analogy (see the author's comment). Not too bad yet, but I'm scared my 1000 core machine of the future is going to go slower than my machine today! I don't know the answer to this problem yet. Maybe some kind of per-core work queue system with hierarchical work stealing. Definitely hardware support. But what I do know is that using locks specifically prevents any solution to this. We should be abstracting our code away from the details of locks as soon as possible, so we can swap in whatever solution arrives when it does. NAct uses locks at the moment. But my advice is that you code using actors (which do scale well as software gets bigger). And when there's a better way of implementing actors that'll scale well as hardware gets bigger, only NAct needs to work out how to use it, and your program will go fast on it's own.

    Read the article

  • A Method for Reducing Contention and Overhead in Worker Queues for Multithreaded Java Applications

    - by Janice J. Heiss
    A java.net article, rich in practical resources, by IBM India Labs’ Sathiskumar Palaniappan, Kavitha Varadarajan, and Jayashree Viswanathan, explores the challenge of writing code in a way that that effectively makes use of the resources of modern multicore processors and multiprocessor servers.As the article states: “Many server applications, such as Web servers, application servers, database servers, file servers, and mail servers, maintain worker queues and thread pools to handle large numbers of short tasks that arrive from remote sources. In general, a ‘worker queue’ holds all the short tasks that need to be executed, and the threads in the thread pool retrieve the tasks from the worker queue and complete the tasks. Since multiple threads act on the worker queue, adding tasks to and deleting tasks from the worker queue needs to be synchronized, which introduces contention in the worker queue.” The article goes on to explain ways that developers can reduce contention by maintaining one queue per thread. It also demonstrates a work-stealing technique that helps in effectively utilizing the CPU in multicore systems. Read the rest of the article here.

    Read the article

  • matplotlib and python multithread file processing

    - by Napseis
    I have a large number of files to process. I have written a script that get, sort and plot the datas I want. So far, so good. I have tested it and it gives the desired result. Then I wanted to do this using multithreading. I have looked into the doc and examples on the internet, and using one thread in my program works fine. But when I use more, at some point I get random matplotlib error, and I suspect some conflict there, even though I use a function with names for the plots, and iI can't see where the problem could be. Here is the whole script should you need more comment, i'll add them. Thank you. #!/usr/bin/python import matplotlib matplotlib.use('GTKAgg') import numpy as np from scipy.interpolate import griddata import matplotlib.pyplot as plt import matplotlib.colors as mcl from matplotlib import rc #for latex import time as tm import sys import threading import Queue #queue in 3.2 and Queue in 2.7 ! import pdb #the debugger rc('text', usetex=True)#for latex map=0 #initialize the map index. It will be use to index the array like this: array[map,[x,y]] time=np.zeros(1) #an array to store the time middle_h=np.zeros((0,3)) #x phi c #for the middle of the box current_file=open("single_void_cyl_periodic_phi_c_middle_h_out",'r') for line in current_file: if line.startswith('# === time'): map+=1 np.append(time,[float(line.strip('# === time '))]) elif line.startswith('#'): pass else: v=np.fromstring(line,dtype=float,sep=' ') middle_h=np.vstack( (middle_h,v[[1,3,4]]) ) current_file.close() middle_h=middle_h.reshape((map,-1,3)) #3d array: map, x, phi,c ##### def load_and_plot(): #will load a map file, and plot it along with the corresponding profile loaded before while not exit_flag: print("fecthing work ...") #try: if not tasks_queue.empty(): map_index=tasks_queue.get() print("----> working on map: %s" %map_index) x,y,zp=np.loadtxt("single_void_cyl_growth_periodic_post_map_"+str(map_index),unpack=True, usecols=[1, 2,3]) for i,el in enumerate(zp): if el<0.: zp[i]=0. xv=np.unique(x) yv=np.unique(y) X,Y= np.meshgrid(xv,yv) Z = griddata((x, y), zp, (X, Y),method='nearest') figure=plt.figure(num=map_index,figsize=(14, 8)) ax1=plt.subplot2grid((2,2),(0,0)) ax1.plot(middle_h[map_index,:,0],middle_h[map_index,:,1],'*b') ax1.grid(True) ax1.axis([-15, 15, 0, 1]) ax1.set_title('Profiles') ax1.set_ylabel(r'$\phi$') ax1.set_xlabel('x') ax2=plt.subplot2grid((2,2),(1,0)) ax2.plot(middle_h[map_index,:,0],middle_h[map_index,:,2],'*r') ax2.grid(True) ax2.axis([-15, 15, 0, 1]) ax2.set_ylabel('c') ax2.set_xlabel('x') ax3=plt.subplot2grid((2,2),(0,1),rowspan=2,aspect='equal') sub_contour=ax3.contourf(X,Y,Z,np.linspace(0,1,11),vmin=0.) figure.colorbar(sub_contour,ax=ax3) figure.savefig('single_void_cyl_'+str(map_index)+'.png') plt.close(map_index) tasks_queue.task_done() else: print("nothing left to do, other threads finishing,sleeping 2 seconds...") tm.sleep(2) # except: # print("failed this time: %s" %map_index+". Sleeping 2 seconds") # tm.sleep(2) ##### exit_flag=0 nb_threads=2 tasks_queue=Queue.Queue() threads_list=[] jobs=list(range(map)) #each job is composed of a map print("inserting jobs in the queue...") for job in jobs: tasks_queue.put(job) print("done") #launch the threads for i in range(nb_threads): working_bee=threading.Thread(target=load_and_plot) working_bee.daemon=True print("starting thread "+str(i)+' ...') threads_list.append(working_bee) working_bee.start() #wait for all tasks to be treated tasks_queue.join() #flip the flag, so the threads know it's time to stop exit_flag=1 for t in threads_list: print("waiting for threads %s to stop..."%t) t.join() print("all threads stopped")

    Read the article

  • Parallelism implies concurrency but not the other way round right?

    - by Cedric Martin
    I often read that parallelism and concurrency are different things. Very often the answerers/commenters go as far as writing that they're two entirely different things. Yet in my view they're related but I'd like some clarification on that. For example if I'm on a multi-core CPU and manage to divide the computation into x smaller computation (say using fork/join) each running in its own thread, I'll have a program that is both doing parallel computation (because supposedly at any point in time several threads are going to run on several cores) and being concurrent right? While if I'm simply using, say, Java and dealing with UI events and repaints on the Event Dispatch Thread plus running the only thread I created myself, I'll have a program that is concurrent (EDT + GC thread + my main thread etc.) but not parallel. I'd like to know if I'm getting this right and if parallelism (on a "single but multi-cores" system) always implies concurrency or not? Also, are multi-threaded programs running on multi-cores CPU but where the different threads are doing totally different computation considered to be using "parallelism"?

    Read the article

  • Number crunching algo for learning multithreading?

    - by Austin Henley
    I have never really implemented anything dealing with threads; my only experience with them is reading about them in my undergrad. So I want to change that by writing a program that does some number crunching, but splits it up into several threads. My first ideas for this hopefully simple multithreaded program were: Beal's Conjecture brute force based on my SO question. Bailey-Borwein-Plouffe formula for calculating Pi. Prime number brute force search As you can see I have an interest in math and thought it would be fun to incorporate it into this, rather than coding something such as a server which wouldn't be nearly as fun! But the 3 ideas don't seem very appealing and I have already done some work on them in the past so I was curious if anyone had any ideas in the same spirit as these 3 that I could implement?

    Read the article

  • Why isn't reflection on the SCJP / OCJP?

    - by Nick Rosencrantz
    I read through Kathy Sierra's SCJP study guide and I will read it again more throughly to improve myself as a Java programmer and be able to take the certification either Java 6 or wait for the Java 7 exam (I'm already employed as Java developer so I'm in no hurry to take the exam.) Now I wonder why reflection is not on the exam? The book it seems covers everything that should be on the exam and AFAIK reflection is at least as important as threads if not more used inpractice since many frameworks use reflection. Do you know why reflection is not part of the SCJP? Do you agree that it's at least important to know reflection as threads? Thanks for any answer

    Read the article

  • Why should most logic be in the monitor objects and not in the thread objects when writing concurrent software in Java?

    - by refuser
    When I took the Realtime and Concurrent programming course our lecturer told us that when writing concurrent programs in Java and using monitors, most of the logic should be in the monitor and as little as possible in the threads that access it. I never really understood why and I really would like to. Let me clarify. In this particular case we had several classes. Lift extends Thread Person extends Thread LiftView Monitor, all methods synchronized. This is nothing we came up with, our task was to implement a lift simulation with persons waiting on different floors, and theses were the class skeletons that were given. Then our lecturer said to implement most of the logic in the monitor (he was talking about class Monitor as THE monitor) and as little as possible in the threads. Why would he make a statement like that?

    Read the article

  • Achieving more fluent movement

    - by Robin92
    I'm working on my first OpenGL 2D game and I've just locked the framerate of my game. However, the way objects move is far from satisfying: they tend to lag, which is shown in this video. I've thought how more fluent animation can be achieved and started getting segmentation faults due to accessing the same object by two different threads. I've tried the following threads' setting: Drawing, creating new objects Moving player, moving objects, deleting objects Currently my application uses this setting: Drawing, creating new objects, moving objects, deleting object Moving player Any ideas would be appreciated. EDIT: I've tried increasing the FPS limit but lags are noticeable even at 200 fps.

    Read the article

  • Unnecessary Java context switches

    - by Paul Morrison
    I have a network of Java Threads (Flow-Based Programming) communicating via fixed-capacity channels - running under WindowsXP. What we expected, based on our experience with "green" threads (non-preemptive), would be that threads would switch context less often (thus reducing CPU time) if the channels were made bigger. However, we found that increasing channel size does not make any difference to the run time. What seems to be happening is that Java decides to switch threads even though channels aren't full or empty (i.e. even though a thread doesn't have to suspend), which costs CPU time for no apparent advantage. Also changing Thread priorities doesn't make any observable difference. My question is whether there is some way of persuading Java not to make unnecessary context switches, but hold off switching until it is really necessary to switch threads - is there some way of changing Java's dispatching logic? Or is it reacting to something I didn't pay attention to?! Or are there other asynchronism mechanisms, e.g. Thread factories, Runnable(s), maybe even daemons (!). The answer appears to be non-obvious, as so far none of my correspondents has come up with an answer (including most recently two CS profs). Or maybe I'm missing something that's so obvious that people can't imagine my not knowing it... I've added the send and receive code here - not very elegant, but it seems to work...;-) In case you are wondering, I thought the goLock logic in 'send' might be causing the problem, but removing it temporarily didn't make any difference. I have added the code for send and receive... public synchronized Packet receive() { if (isDrained()) { return null; } while (isEmpty()) { try { wait(); } catch (InterruptedException e) { close(); return null; } if (isDrained()) { return null; } } if (isDrained()) { return null; } if (isFull()) { notifyAll(); // notify other components waiting to send } Packet packet = array[receivePtr]; array[receivePtr] = null; receivePtr = (receivePtr + 1) % array.length; //notifyAll(); // only needed if it was full usedSlots--; packet.setOwner(receiver); if (null == packet.getContent()) { traceFuncs("Received null packet"); } else { traceFuncs("Received: " + packet.toString()); } return packet; } synchronized boolean send(final Packet packet, final OutputPort op) { sender = op.sender; if (isClosed()) { return false; } while (isFull()) { try { wait(); } catch (InterruptedException e) { indicateOneSenderClosed(); return false; } sender = op.sender; } if (isClosed()) { return false; } try { receiver.goLock.lockInterruptibly(); } catch (InterruptedException ex) { return false; } try { packet.clearOwner(); array[sendPtr] = packet; sendPtr = (sendPtr + 1) % array.length; usedSlots++; // move this to here if (receiver.getStatus() == StatusValues.DORMANT || receiver.getStatus() == StatusValues.NOT_STARTED) { receiver.activate(); // start or wake up if necessary } else { notifyAll(); // notify receiver // other components waiting to send to this connection may also get // notified, // but this is handled by while statement } sender = null; Component.network.active = true; } finally { receiver.goLock.unlock(); } return true; }

    Read the article

  • Ubuntu and Surface Pro, wifi issues not fixed?

    - by Confused or too stupid
    Just a quick question, from the numerous other threads I found, is the wifi still a no go straight out of installing Ubuntu? I tried this Dual boot Surface Pro with Ubuntu? with 13.04, the wifi is sporadic in working. From that and other threads I gathered that 13.10 would include the fix, so I tossed that onto the Surface as well, with the same results. Is there anything I am missing? Is there any other distro that has better luck with the Marvell chip?

    Read the article

  • Why is multithreading often preferred for improving performance?

    - by user1849534
    I have a question, it's about why programmers seems to love concurrency and multi-threaded programs in general. I'm considering 2 main approaches here: an async approach basically based on signals, or just an async approach as called by many papers and languages like the new C# 5.0 for example, and a "companion thread" that manages the policy of your pipeline a concurrent approach or multi-threading approach I will just say that I'm thinking about the hardware here and the worst case scenario, and I have tested this 2 paradigms myself, the async paradigm is a winner at the point that I don't get why people 90% of the time talk about multi-threading when they want to speed up things or make a good use of their resources. I have tested multi-threaded programs and async program on an old machine with an Intel quad-core that doesn't offer a memory controller inside the CPU, the memory is managed entirely by the motherboard, well in this case performances are horrible with a multi-threaded application, even a relatively low number of threads like 3-4-5 can be a problem, the application is unresponsive and is just slow and unpleasant. A good async approach is, on the other hand, probably not faster but it's not worst either, my application just waits for the result and doesn't hangs, it's responsive and there is a much better scaling going on. I have also discovered that a context change in the threading world it's not that cheap in real world scenario, it's in fact quite expensive especially when you have more than 2 threads that need to cycle and swap among each other to be computed. On modern CPUs the situation it's not really that different, the memory controller it's integrated but my point is that an x86 CPUs is basically a serial machine and the memory controller works the same way as with the old machine with an external memory controller on the motherboard. The context switch is still a relevant cost in my application and the fact that the memory controller it's integrated or that the newer CPU have more than 2 core it's not bargain for me. For what i have experienced the concurrent approach is good in theory but not that good in practice, with the memory model imposed by the hardware, it's hard to make a good use of this paradigm, also it introduces a lot of issues ranging from the use of my data structures to the join of multiple threads. Also both paradigms do not offer any security abut when the task or the job will be done in a certain point in time, making them really similar from a functional point of view. According to the X86 memory model, why the majority of people suggest to use concurrency with C++ and not just an async approach ? Also why not considering the worst case scenario of a computer where the context switch is probably more expensive than the computation itself ?

    Read the article

  • Parallel processing via multithreading in Java

    - by Robz
    There are certain algorithms whose running time can decrease significantly when one divides up a task and gets each part done in parallel. One of these algorithms is merge sort, where a list is divided into infinitesimally smaller parts and then recombined in a sorted order. I decided to do an experiment to test whether or not I could I increase the speed of this sort by using multiple threads. I am running the following functions in Java on a Quad-Core Dell with Windows Vista. One function (the control case) is simply recursive: // x is an array of N elements in random order public int[] mergeSort(int[] x) { if (x.length == 1) return x; // Dividing the array in half int[] a = new int[x.length/2]; int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)]; for(int i = 0; i < x.length/2; i++) a[i] = x[i]; for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) b[i] = x[i+x.length/2]; // Sending them off to continue being divided mergeSort(a); mergeSort(b); // Recombining the two arrays int ia = 0, ib = 0, i = 0; while(ia != a.length || ib != b.length) { if (ia == a.length) { x[i] = b[ib]; ib++; } else if (ib == b.length) { x[i] = a[ia]; ia++; } else if (a[ia] < b[ib]) { x[i] = a[ia]; ia++; } else { x[i] = b[ib]; ib++; } i++; } return x; } The other is in the 'run' function of a class that extends thread, and recursively creates two new threads each time it is called: public class Merger extends Thread { int[] x; boolean finished; public Merger(int[] x) { this.x = x; } public void run() { if (x.length == 1) { finished = true; return; } // Divide the array in half int[] a = new int[x.length/2]; int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)]; for(int i = 0; i < x.length/2; i++) a[i] = x[i]; for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) b[i] = x[i+x.length/2]; // Begin two threads to continue to divide the array Merger ma = new Merger(a); ma.run(); Merger mb = new Merger(b); mb.run(); // Wait for the two other threads to finish while(!ma.finished || !mb.finished) ; // Recombine the two arrays int ia = 0, ib = 0, i = 0; while(ia != a.length || ib != b.length) { if (ia == a.length) { x[i] = b[ib]; ib++; } else if (ib == b.length) { x[i] = a[ia]; ia++; } else if (a[ia] < b[ib]) { x[i] = a[ia]; ia++; } else { x[i] = b[ib]; ib++; } i++; } finished = true; } } It turns out that function that does not use multithreading actually runs faster. Why? Does the operating system and the java virtual machine not "communicate" effectively enough to place the different threads on different cores? Or am I missing something obvious?

    Read the article

  • What could be a reason for cross-platform server applications developer to make his app work in multiple processes?

    - by Kabumbus
    So we consider a server app development - heavily loaded with messing with big data streams.An app will be running on one powerful server. a server app shall be developed in form of crossplatform application - so to work on Windows, Mac OS X and Linux. So same code many platforms for standing alone server architecture. We wonder what benefits does distributing applications not only over threads but over processes as wall would bring to programmers and to server end users and why? Some people sad to me that even having 48 cores, 4 process threads would be shared via OS throe all cores... is it true BTW?

    Read the article

  • What are the pros and cons of a non-fixed-interval update loop?

    - by akonsu
    I am studying various approaches to implementing a game loop and I have found this article. In the article the author implements a loop which, if the processing falls behind in time, skips frame renderings and just updates the game in a loop (the last variant called "Constant Game Speed independent of Variable FPS"). I do not understand why it is acceptable to call update_game() in a loop without making sure the update function is called at a particular interval. I do not see any value in doing this. I would think that in my game I want to be sure the game is updated periodically with a known period. So maybe it is worthwhile to have two threads, one would call update periodically, and the other one would redraw the game, also periodically? Would this be a good and practical approach? Of course I would need to synchronise the threads.

    Read the article

  • IPC linux huge transaction

    - by poly
    I'm building and application that requires huge transactions/sec of data and I need to use IPC to for the mutithreaded mutliprocceses communication, I know that there are a lot of methods to be used but not sure which one to choose for this application. This is what the application is gonna have, 4 processes, each process has 4 threads, the data chunk that needs to be transferred between two or more threads is around 400KB. I found that fifo is good choice except that it's 64K which is not that big so i'll need to modify and recompile the kernel but not sure if this is the right thing to do? Anyway, I'm open to any suggestions and I'd like to squeeze your experience in this :) and I appreciate it in advance.

    Read the article

  • What could be a reason for cross-platform server applications developer to make his app work in multiple processes?

    - by Kabumbus
    We consider a server app development - heavily loaded with messing with big data streams. An app will be running on one powerful server. A server app will be developed in form of crossplatform application - working on Windows, Mac OS X and Linux. So same code, many platforms for stand alone server architecture. We wonder what are the benefits of distributing applications not only over threads but over processes as well, for programmers and server end users? Some people said to me that even having 48 cores, 4 process threads would be shared via OS through all cores, is that true?

    Read the article

  • Performance impact of the new mtmalloc memory allocator

    - by nospam(at)example.com (Joerg Moellenkamp)
    I wrote at a number of occasions (here or here), that it could be really beneficial to use a different memory allocator for highly-threaded workloads, as the standard allocator is well ... the standard, however not very effective as soon as many threads comes into play. I didn't wrote about this as it was in my phase of silence but there was some change in the allocator area, Solaris 10 got a revamped mtmalloc allocator in version Solaris 10 08/11 (as described in "libmtmalloc Improvements"). The new memory allocator was introduced to Solaris development by the PSARC case 2010/212. But what's the effect of this new allocator and how does it works? Rickey C. Weisner wrote a nice article with "How Memory Allocation Affects Performance in Multithreaded Programs" explaining the inner mechanism of various allocators but he also publishes test results comparing Hoard, mtmalloc, umem, new mtmalloc and the libc malloc. Really interesting read and a must for people running applications on servers with a high number of threads.

    Read the article

  • Game Development Resources? [closed]

    - by stas
    I want to make an iPhone game and I was wondering if somebody could point me to some resources. Before you rage and close this, I am not looking for just a tutorial on programming, I could just google that. I need to find a tutorial on how to actually make a game, as in how to organize your code, what type of methods to run in separate threads, how to manage these threads in a game, etc. I already know everything I need; I can write a physics engine from scratch, I can write a 3D graphics pipeline from scratch, and so on. What I cannot figure out is how to combine all of this knowledge into correctly and efficiently making a game. Obviously for this one would probably go to college, but seeing as I am still in high school, that is not an option. If anyone would know some tutorials or resources, any pointers would be appreciated.

    Read the article

  • Why C++ people loves multithreading when it comes to performances?

    - by user1849534
    I have a question, it's about why programmers seems to love concurrency and multi-threaded programs in general. I'm considering 2 main approach here: an async approach basically based on signals, or just an async approach as called by many papers and languages like the new C# 5.0 for example, and a "companion thread" that maanges the policy of your pipeline a concurrent approach or multi-threading approach I will just say that I'm thinking about the hardware here and the worst case scenario, and I have tested this 2 paradigms myself, the async paradigm is a winner at the point that I don't get why people 90% of the time talk about concurrency when they wont to speed up things or make a good use of their resources. I have tested multi-threaded programs and async program on an old machine with an Intel quad-core that doesn't offer a memory controller inside the CPU, the memory is managed entirely by the motherboard, well in this case performances are horrible with a multi-threaded application, even a relatively low number of threads like 3-4-5 can be a problem, the application is unresponsive and is just slow and unpleasant. A good async approach is, on the other hand, probably not faster but it's not worst either, my application just waits for the result and doesn't hangs, it's responsive and there is a much better scaling going on. I have also discovered that a context change in the threading world it's not that cheap in real world scenario, it's infact quite expensive especially when you have more than 2 threads that need to cycle and swap among each other to be computed. On modern CPUs the situation it's not really that different, the memory controller it's integrated but my point is that an x86 CPUs is basically a serial machine and the memory controller works the same way as with the old machine with an external memory controller on the motherboard. The context switch is still a relevant cost in my application and the fact that the memory controller it's integrated or that the newer CPU have more than 2 core it's not bargain for me. For what i have experienced the concurrent approach is good in theory but not that good in practice, with the memory model imposed by the hardware, it's hard to make a good use of this paradigm, also it introduces a lot of issues ranging from the use of my data structures to the join of multiple threads. Also both paradigms do not offer any security abut when the task or the job will be done in a certain point in time, making them really similar from a functional point of view. According to the X86 memory model, why the majority of people suggest to use concurrency with C++ and not just an async aproach ? Also why not considering the worst case scenario of a computer where the context switch is probably more expensive than the computation itself ?

    Read the article

  • process monitoring in c under linux

    - by poly
    I'm trying to write a multi threaded/processes application and it need to know how to monitor a process from another process all the time. So here is what I have, I have a 2 processes, each with multiple threads that handle the network part, then another 2 process also with multiple threads that interact with DB and with the network processes, what I need to do is that if for example one of the network processes goes down the DB process start sending to the live network process until the second one is up again. I'm using fifo between the DB and the network process. I was thinking of sending messages with message passing all the time but not sure whether this is a good idea or I need to use some other IPC for this issue, or probably neither is good and I need to use entirely something else? Thanks,

    Read the article

< Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50  | Next Page >