Search Results

Search found 31421 results on 1257 pages for 'software performance'.

Page 228/1257 | < Previous Page | 224 225 226 227 228 229 230 231 232 233 234 235  | Next Page >

  • How do I diagnose a bottleneck in an Intel Atom based Ubuntu server?

    - by Jon Cage
    I have a small media server at home which has software raid and a gigabit link to the rest of my network. For some reason though, I only get ~10MB/s transfers when copying to/from the server. I use software RAID5 (mdadm) over 4 1TB disks. On top of that I then use LVM to give me a huge pool of disk space which is then split up into multiple partitions which can be resized as and when they need it. I'm guessing this it most likely the cause, but I'd like to know for sure where the root cause is. So, how can I benchmark network throughput (Windows 7 desktop <- Ubuntu server) and hard disk performance to try and identify where my bottleneck might be? [Edit] If anyone's interested, the motherboard is an Intel Desktop Board D945GCLF2. So that's a 300 series Atom processor with the Intel® 945GC Express Chipset [Edit2] I feel like such a fool! I just checked my desktop and I had the slower of the two onboard NICs plugged in so the server is probably not at fault here. Transferring a copy of ubuntu off the server I get ~35-40MB/s according to Windows 7. I'll do those HD tests when I get a chance though (just for completeness).

    Read the article

  • How much does a computer actually cost?

    - by Cawas
    Ok, so when you buy a new notebook, you spend about $1 or $2 thousand with the OS included. When you make your own desktop machine you can get as low as $100 for a good one today, with not a single piece of software included. It can't be much lower, but it can go lot higher. People, including myself, tend to believe that's the price of a computer, but then there comes the softwares. I just stumbled upon a nice piece of application I could use myself, but it's very specific, very tiny, and most people would never bother about this. And it costs "just $12". That is a lot for something I may use just once or twice! OS upgrades, hardware malfunction, and your custom set of software actually raise the computer price quite a lot, thus this question: how much we pay in the end for our personal computers? I'd like to see some statistics on that. Maybe divided into 3 categories or something, but some data with averages, minimum and "maximum" costs would be very nice. Maybe a "cost per year" would be nice. Just wondering.

    Read the article

  • Logmein does not work at home?

    - by Littlet-ENG
    I've been using logmein successfully for may situations and have had very good success. Our company has an Log me in Pro account. I have used this to share my desktop with customers. At work, I have had no problem with my laptop. At home, one program (solid-works) that I need to share with my customers, will not display the active screen. I spent 45 min on the phone with both the software for the cad system and logmein support with not help. I need help in narrowing down what the problem is on my computer. The support guys at Solid-works got another remote software to work, so its not the program. I can get the logmein to work at the office so its not the settings of the logmein pro account. The LMI people say its a setting on my computer.? -internet is fast enough at home -can't narrow down the problem -changed graphical settings and that didn't work. Any Suggestions?

    Read the article

  • Raid1 with active and spare partition

    - by Daniel Baron
    I am having the following problem with a RAID1 software raid partition on my Ubuntu system (10.04 LTS, 2.6.32-24-server in case it matters). One of my disks (sdb5) reported I/O errors and was therefore marked faulty in the array. The array was then degraded with one active device. Hence, I replaced the harddisk, cloned the partition table and added all new partitions to my raid arrays. After syncing all partitions ended up fine, having 2 active devices - except one of them. The partition which reported the faulty disk before, however, did not include the new partition as an active device but as a spare disk: md3 : active raid1 sdb5[2] sda5[1] 4881344 blocks [2/1] [_U] A detailed look reveals: root@server:~# mdadm --detail /dev/md3 [...] Number Major Minor RaidDevice State 2 8 21 0 spare rebuilding /dev/sdb5 1 8 5 1 active sync /dev/sda5 So here is the question: How do I tell my raid to turn the spare disk into an active one? And why has it been added as a spare device? Recreating or reassembling the array is not an option, because it is my root partition. And I can not find any hints to that subject in the Software Raid HOWTO. Any help would be appreciated.

    Read the article

  • How do I reinitialise a failed RAID 5 drive using terminal on Ubuntu Server

    - by Stephen
    I've currently put together a new system and part of that has been creating a software RAID 5 using 'mdadm' in Ubuntu Server. I successfully got to the point where I create the array using: sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 I left it to do its thing overnight then used the following command to check on it: watch cat /proc/mdstat To which the following was returned: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdd1[4](S) sdc1[2] sdb1[1] sda1[0](F) 5860535808 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/2] [_UU_] unused devices: <none> It appears that one has failed (and I'm not too savvy with why another is a spare). So, just to be sure that something else isn't amiss I wanted to try and re-engage the failed drive. Can someone explain how I can do that and what I should do with the spare (if anything). And also how do I know when synchronisation is complete? The tutorial I used to get this far is located here: http://sonniesedge.co.uk/2009/06/13/software-raid-5-on-ubuntu-904/ Many thanks! p.s. Here is some extra information that may help: sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Mon Jun 18 21:14:21 2012 Raid Level : raid5 Array Size : 5860535808 (5589.04 GiB 6001.19 GB) Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Update Time : Mon Jun 18 21:50:26 2012 State : clean, FAILED Active Devices : 2 Working Devices : 3 Failed Devices : 1 Spare Devices : 1 Layout : left-symmetric Chunk Size : 512K Name : myraidbox:0 (local to host myraidbox) UUID : a269ee94:a161600c:fb1665e7:bd2f27b3 Events : 13 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 0 0 3 removed 0 8 1 - faulty spare /dev/sda1 4 8 49 - spare /dev/sdd1

    Read the article

  • Sharing music on NAS with Zune and iPod?

    - by osij2is
    After being a long time iPod owner, I'm switching to the new Zune with its subscription model. I haven't bought a Zune yet but I'm planning on doing so within the next month or so. I have approximately 40GB worth of music and my girlfriend has her iPod music library around 30GB. I've been trying to figure out how to migrate all our music off of our laptops/desktops and centralize everything on my NAS. In sharing iPod music isn't too bad. Sharing from one machine to all is fairly easy within the iTunes player. As far as storing all the music on a NAS, again, iPods aren't too bad and imagine other systems aren't difficult. But I'm really new to the Zune and I'm beginning to run into some issues. My questions are: Is it possible to store all music from our iPods and Zune subscriptions and share music between the iPod/Zune within the same file share on my NAS? I'm sure it's possible to store music on a share, but I'm not sure how iTunes and the Zune software differs. Is there 3rd party software, maybe something like DoubleTwist that can sync based from NAS to multiple desktop/laptops? I've never used DoubleTwist but it's something that I found that looks close to being what I need. I've never quite done this myself so I'm trying to find a solution that can: a) store music on a network share; b) sync between different devices (Zune/iPod) seamlessly.

    Read the article

  • Creating dynamic map graphs

    - by Mehper C. Palavuzlar
    I need a software (or softwares) to achieve the following. I'm not sure if it could be done, but I'd like to hear the suggestions from super users. The data I want to use for graphing purposes include sales figures of magazines by provinces. There are 81 provinces of Turkiye, and I want the computer to automatically paint / write on a graph according to the sales magnitude of the provinces. Since there are loads of magazines with loads of issues, the process must be executed automatically just after selecting the related magazine and issue. So there will be graphs showing the sales weight of the whole country with some nice illustrations. Those graphs might be used as part of some decision support mechanism to help field teams. Is it possible? I have all the data and base maps of Turkiye to be filled/painted. I'm sure this is not easy. If there is a way to do that, it might probably include more than one software. Thanks in advance for any valuable comments and answers.

    Read the article

  • Is it possible to change the Raid5 chunk size of an existing device?

    - by AlexCombas
    I have an existing raid5 device which I created using mdadm on Linux. When I created the device I set the chunk size to 64 but I would like to test the performance of various sizes but I don't want to have to rebuild my entire system to do so. If it is not possible to do it live then is it possible to do this by booting with a rescue disk? Any advice on the steps how to do this, either live or with a rescue disk, will be greatly appreciated.

    Read the article

  • C++ custom exceptions: run time performance and passing exceptions from C++ to C

    - by skyeagle
    I am writing a custom C++ exception class (so I can pass exceptions occuring in C++ to another language via a C API). My initial plan of attack was to proceed as follows: //C++ myClass { public: myClass(); ~myClass(); void foo() // throws myException int foo(const int i, const bool b) // throws myException } * myClassPtr; // C API #ifdef __cplusplus extern "C" { #endif myClassPtr MyClass_New(); void MyClass_Destroy(myClassPtr p); void MyClass_Foo(myClassPtr p); int MyClass_FooBar(myClassPtr p, int i, bool b); #ifdef __cplusplus }; #endif I need a way to be able to pass exceptions thrown in the C++ code to the C side. The information I want to pass to the C side is the following: (a). What (b). Where (c). Simple Stack Trace (just the sequence of error messages in order they occured, no debugging info etc) I want to modify my C API, so that the API functions take a pointer to a struct ExceptionInfo, which will contain any exception info (if an exception occured) before consuming the results of the invocation. This raises two questions: Question 1 1. Implementation of each of the C++ methods exposed in the C API needs to be enclosed in a try/catch statement. The performance implications for this seem quite serious (according to this article): "It is a mistake (with high runtime cost) to use C++ exception handling for events that occur frequently, or for events that are handled near the point of detection." At the same time, I remember reading somewhere in my C++ days, that all though exception handling is expensive, it only becmes expensive when an exception actually occurs. So, which is correct?. what to do?. Is there an alternative way that I can trap errors safely and pass the resulting error info to the C API?. Or is this a minor consideration (the article after all, is quite old, and hardware have improved a bit since then). Question 2 I wuld like to modify the exception class given in that article, so that it contains a simple stack trace, and I need some help doing that. Again, in order to make the exception class 'lightweight', I think its a good idea not to include any STL classes, like string or vector (good idea/bad idea?). Which potentially leaves me with a fixed length C string (char*) which will be stack allocated. So I can maybe just keep appending messages (delimted by a unique separator [up to maximum length of buffer])... Its been a while since I did any serious C++ coding, and I will be grateful for the help. BTW, this is what I have come up with so far (I am intentionally, not deriving from std::exception because of the performance reasons mentioned in the article, and I am instead, throwing an integral exception (based on an exception enumeration): class fast_exception { public: fast_exception(int what, char const* file=0, int line=0) : what_(what), line_(line), file_(file) {/*empty*/} int what() const { return what_; } int line() const { return line_; } char const* file() const { return file_; } private: int what_; int line_; char const[MAX_BUFFER_SIZE] file_; }

    Read the article

  • push_back of STL list got bad performance?

    - by Leon Zhang
    I wrote a simple program to test STL list performance against a simple C list-like data structure. It shows bad performance at "push_back()" line. Any comments on it? $ ./test2 Build the type list : time consumed -> 0.311465 Iterate over all items: time consumed -> 0.00898 Build the simple C List: time consumed -> 0.020275 Iterate over all items: time consumed -> 0.008755 The source code is: #include <stdexcept> #include "high_resolution_timer.hpp" #include <list> #include <algorithm> #include <iostream> #define TESTNUM 1000000 /* The test struct */ struct MyType { int num; }; /* * C++ STL::list Test */ typedef struct MyType* mytype_t; void myfunction(mytype_t t) { } int test_stl_list() { std::list<mytype_t> mylist; util::high_resolution_timer t; /* * Build the type list */ t.restart(); for(int i = 0; i < TESTNUM; i++) { mytype_t aItem = (mytype_t) malloc(sizeof(struct MyType)); if(aItem == NULL) { printf("Error: while malloc\n"); return -1; } aItem->num = i; mylist.push_back(aItem); } std::cout << " Build the type list : time consumed -> " << t.elapsed() << std::endl; /* * Iterate over all item */ t.restart(); std::for_each(mylist.begin(), mylist.end(), myfunction); std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl; return 0; } /* * a simple C list */ struct MyCList; struct MyCList{ struct MyType m; struct MyCList* p_next; }; int test_simple_c_list() { struct MyCList* p_list_head = NULL; util::high_resolution_timer t; /* * Build it */ t.restart(); struct MyCList* p_new_item = NULL; for(int i = 0; i < TESTNUM; i++) { p_new_item = (struct MyCList*) malloc(sizeof(struct MyCList)); if(p_new_item == NULL) { printf("ERROR : while malloc\n"); return -1; } p_new_item->m.num = i; p_new_item->p_next = p_list_head; p_list_head = p_new_item; } std::cout << " Build the simple C List: time consumed -> " << t.elapsed() << std::endl; /* * Iterate all items */ t.restart(); p_new_item = p_list_head; while(p_new_item->p_next != NULL) { p_new_item = p_new_item->p_next; } std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl; return 0; } int main(int argc, char** argv) { if(test_stl_list() != 0) { printf("ERROR: error at testcase1\n"); return -1; } if(test_simple_c_list() != 0) { printf("ERROR: error at testcase2\n"); return -1; } return 0; }

    Read the article

  • Linq2SQL vs NHibernate performance (have I gone mad?)

    - by HeavyWave
    I have written the following tests to compare performance of Linq2SQL and NHibernate and I find results to be somewhat strange. Mappings are straight forward and identical for both. Both are running against a live DB. Although I'm not deleting Campaigns in case of Linq, but that shouldn't affect performance by more than 10 ms. Linq: [Test] public void Test1000ReadsWritesToAgentStateLinqPrecompiled() { Stopwatch sw = new Stopwatch(); Stopwatch swIn = new Stopwatch(); sw.Start(); for (int i = 0; i < 1000; i++) { swIn.Reset(); swIn.Start(); ReadWriteAndDeleteAgentStateWithLinqPrecompiled(); swIn.Stop(); Console.WriteLine("Run ReadWriteAndDeleteAgentState: " + swIn.ElapsedMilliseconds + " ms"); } sw.Stop(); Console.WriteLine("Total Time: " + sw.ElapsedMilliseconds + " ms"); Console.WriteLine("Average time to execute queries: " + sw.ElapsedMilliseconds / 1000 + " ms"); } private static readonly Func<AgentDesktop3DataContext, int, EntityModel.CampaignDetail> GetCampaignById = CompiledQuery.Compile<AgentDesktop3DataContext, int, EntityModel.CampaignDetail>( (ctx, sessionId) => (from cd in ctx.CampaignDetails join a in ctx.AgentCampaigns on cd.CampaignDetailId equals a.CampaignDetailId where a.AgentStateId == sessionId select cd).FirstOrDefault()); private void ReadWriteAndDeleteAgentStateWithLinqPrecompiled() { int id = 0; using (var ctx = new AgentDesktop3DataContext()) { EntityModel.AgentState agentState = new EntityModel.AgentState(); var campaign = new EntityModel.CampaignDetail { CampaignName = "Test" }; var campaignDisposition = new EntityModel.CampaignDisposition { Code = "123" }; campaignDisposition.Description = "abc"; campaign.CampaignDispositions.Add(campaignDisposition); agentState.CallState = 3; campaign.AgentCampaigns.Add(new AgentCampaign { AgentState = agentState }); ctx.CampaignDetails.InsertOnSubmit(campaign); ctx.AgentStates.InsertOnSubmit(agentState); ctx.SubmitChanges(); id = agentState.AgentStateId; } using (var ctx = new AgentDesktop3DataContext()) { var dbAgentState = ctx.GetAgentStateById(id); Assert.IsNotNull(dbAgentState); Assert.AreEqual(dbAgentState.CallState, 3); var campaignDetails = GetCampaignById(ctx, id); Assert.AreEqual(campaignDetails.CampaignDispositions[0].Description, "abc"); } using (var ctx = new AgentDesktop3DataContext()) { ctx.DeleteSessionById(id); } } NHibernate (the loop is the same): private void ReadWriteAndDeleteAgentState() { var id = WriteAgentState().Id; StartNewTransaction(); var dbAgentState = agentStateRepository.Get(id); Assert.IsNotNull(dbAgentState); Assert.AreEqual(dbAgentState.CallState, 3); Assert.AreEqual(dbAgentState.Campaigns[0].Dispositions[0].Description, "abc"); var campaignId = dbAgentState.Campaigns[0].Id; agentStateRepository.Delete(dbAgentState); NHibernateSession.Current.Transaction.Commit(); Cleanup(campaignId); NHibernateSession.Current.BeginTransaction(); } Results: NHibernate: Total Time: 9469 ms Average time to execute 13 queries: 9 ms Linq: Total Time: 127200 ms Average time to execute 13 queries: 127 ms Linq lost by 13.5 times! Event with precompiled queries (both read queries are precompiled). This can't be right, although I expected NHibernate to be faster, this is just too big of a difference, considering mappings are identical and NHibernate actually executes more queries against the DB.

    Read the article

  • UIButton performance in UITableViewCell vs UIView

    - by marcel salathe
    I'd like to add a UIButton to a custom UITableViewCell (programmatically). This is easy to do, but I'm finding that the "performance" of the button in the cell is slow - that is, when I touch the button, there is quite a bit of delay until the button visually goes into the highlighted state. The same type of button on a regular UIView is very responsive in comparison. In order to isolate the problem, I've created two views - one is a simple UIView, the other is a UITableView with only one UITableViewCell. I've added buttons to both views (the UIView and the UITableViewCell), and the performance difference is quite striking. I've searched the web and read the Apple docs but haven't really found the cause of the problem. My guess is that it somehow has to do with the responder chain, but I can't quite put my finger on it. I must be doing something wrong, and I'd appreciate any help. Thanks. Demo code: ViewController.h #import <UIKit/UIKit.h> @interface ViewController : UIViewController <UITableViewDelegate, UITableViewDataSource> @property UITableView* myTableView; @property UIView* myView; ViewController.m #import "ViewController.h" #import "CustomCell.h" @implementation ViewController @synthesize myTableView, myView; - (id)initWithNibName:(NSString *)nibNameOrNil bundle:(NSBundle *)nibBundleOrNil { self = [super initWithNibName:nibNameOrNil bundle:nibBundleOrNil]; if (self) { [self initMyView]; [self initMyTableView]; } return self; } - (void) initMyView { UIView* newView = [[UIView alloc] initWithFrame:CGRectMake(0,0,[[UIScreen mainScreen] bounds].size.width,100)]; self.myView = newView; // button on regularView UIButton* myButton = [UIButton buttonWithType:UIButtonTypeRoundedRect]; [myButton addTarget:self action:@selector(pressedMyButton) forControlEvents:UIControlEventTouchUpInside]; [myButton setTitle:@"I'm fast" forState:UIControlStateNormal]; [myButton setFrame:CGRectMake(20.0, 10.0, 160.0, 30.0)]; [[self myView] addSubview:myButton]; } - (void) initMyTableView { UITableView *newTableView = [[UITableView alloc] initWithFrame:CGRectMake(0,100,[[UIScreen mainScreen] bounds].size.width,[[UIScreen mainScreen] bounds].size.height-100) style:UITableViewStyleGrouped]; self.myTableView = newTableView; self.myTableView.delegate = self; self.myTableView.dataSource = self; } -(void) pressedMyButton { NSLog(@"pressedMyButton"); } - (void)viewDidLoad { [super viewDidLoad]; [[self view] addSubview:self.myView]; [[self view] addSubview:self.myTableView]; } - (NSInteger)numberOfSectionsInTableView:(UITableView *)tableView { return 1; } - (NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section { return 1; } - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { CustomCell *customCell = [tableView dequeueReusableCellWithIdentifier:@"CustomCell"]; if (customCell == nil) { customCell = [[CustomCell alloc] initWithStyle:UITableViewCellStyleSubtitle reuseIdentifier:@"CustomCell"]; } return customCell; } @end CustomCell.h #import <UIKit/UIKit.h> @interface CustomCell : UITableViewCell @property (retain, nonatomic) UIButton* cellButton; @end CustomCell.m #import "CustomCell.h" @implementation CustomCell @synthesize cellButton; - (id)initWithStyle:(UITableViewCellStyle)style reuseIdentifier:(NSString *)reuseIdentifier { self = [super initWithStyle:style reuseIdentifier:reuseIdentifier]; if (self) { // button within cell cellButton = [UIButton buttonWithType:UIButtonTypeRoundedRect]; [cellButton addTarget:self action:@selector(pressedCellButton) forControlEvents:UIControlEventTouchUpInside]; [cellButton setTitle:@"I'm sluggish" forState:UIControlStateNormal]; [cellButton setFrame:CGRectMake(20.0, 10.0, 160.0, 30.0)]; [self addSubview:cellButton]; } return self; } - (void) pressedCellButton { NSLog(@"pressedCellButton"); } @end

    Read the article

  • Free Web UI design software

    - by Pich
    Does anybody know of any free Web UI design software? EDIT: I am looking for a UI mockup tool (that is freeware) to create stuff like this: http://blogs.atlassian.com/jira/Mockups%20UI.jpg I works a developer with the task to design the UI of an application. I want to draw some examples of how the webpages can look and show it to the requirements team.

    Read the article

  • Smart client software factory view activation problem

    - by vinay
    Hi, Im using smart client software factory in our application i created one workspace dynamically,based on that i can create n numbers of views.the view is modal dialog and i have used some window workspace. but the problem is i have to implement save all fuctionalities in menu. but i dont know which view is activated/focused How to get the focused screen,is their any event will fore if on focus change. Please help on this aspect. Thanks , Vinay

    Read the article

  • What Project Management Software should I use?

    - by Vecdid
    I am looking for either an MS tool like project or an open source equivalent. Yes I could google it, but I am looking for some insight from some people whp handle the end of the software I would as a programmer. The tool has to run using IIS as the webserver. What are some of the best features of your suggestion?

    Read the article

  • Touch Typing Software recommendations

    - by Mike
    Since the keyboard is the interface we use to the computer, I've always thought touch typing should be something I should learn, but I've always been, well, lazy is the word. So, anyone recommend any good touch typing software? It's easy enough to google, but I'ld like to hear recommendations.

    Read the article

  • What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

    - by Alex Dunlop
    Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

    Read the article

< Previous Page | 224 225 226 227 228 229 230 231 232 233 234 235  | Next Page >