Search Results

Search found 14282 results on 572 pages for 'performance counter'.

Page 110/572 | < Previous Page | 106 107 108 109 110 111 112 113 114 115 116 117 | Next Page >

Performance issue between builds

- by DeadMG

I've been developing a small indie game in my spare time and have run across an inexplicable issue. Some builds of the game will randomly run several hundred frames per second slower than other builds. For example, when rendering some text and no 3D scene, I can achieve 1800FPS on my own hardware. Add one 3D sphere (10k verts, pixel shaded), achieve 1700 FPS. Add two more spheres, achieve 800 FPS. Remove all spheres, achieve 1100FPS- even though the code now renders the same scene as I previously achieved at 1800FPS, which is just the FPS counter being rendered. I've tried rebuilding and cleaning the project and rebooting the compiler. This is in Release mode and I turned on all the optimizations I could find. Any suggestions as to the cause? I ran a quick profile, and Visual Studio seems to think that over 90% of my time was spent in D3D9_43.dll, suggesting that it's not a bug in my app, which doesn't explain why it manifests in only some builds. I rebooted my machine and it's back up to 1800FPS. I think it's a bug in the DirectX SDK tools (amongst many others). Going to delete this question.

Read the article
C++ custom exceptions: run time performance and passing exceptions from C++ to C

- by skyeagle

I am writing a custom C++ exception class (so I can pass exceptions occuring in C++ to another language via a C API). My initial plan of attack was to proceed as follows: //C++ myClass { public: myClass(); ~myClass(); void foo() // throws myException int foo(const int i, const bool b) // throws myException } * myClassPtr; // C API #ifdef __cplusplus extern "C" { #endif myClassPtr MyClass_New(); void MyClass_Destroy(myClassPtr p); void MyClass_Foo(myClassPtr p); int MyClass_FooBar(myClassPtr p, int i, bool b); #ifdef __cplusplus }; #endif I need a way to be able to pass exceptions thrown in the C++ code to the C side. The information I want to pass to the C side is the following: (a). What (b). Where (c). Simple Stack Trace (just the sequence of error messages in order they occured, no debugging info etc) I want to modify my C API, so that the API functions take a pointer to a struct ExceptionInfo, which will contain any exception info (if an exception occured) before consuming the results of the invocation. This raises two questions: Question 1 1. Implementation of each of the C++ methods exposed in the C API needs to be enclosed in a try/catch statement. The performance implications for this seem quite serious (according to this article): "It is a mistake (with high runtime cost) to use C++ exception handling for events that occur frequently, or for events that are handled near the point of detection." At the same time, I remember reading somewhere in my C++ days, that all though exception handling is expensive, it only becmes expensive when an exception actually occurs. So, which is correct?. what to do?. Is there an alternative way that I can trap errors safely and pass the resulting error info to the C API?. Or is this a minor consideration (the article after all, is quite old, and hardware have improved a bit since then). Question 2 I wuld like to modify the exception class given in that article, so that it contains a simple stack trace, and I need some help doing that. Again, in order to make the exception class 'lightweight', I think its a good idea not to include any STL classes, like string or vector (good idea/bad idea?). Which potentially leaves me with a fixed length C string (char*) which will be stack allocated. So I can maybe just keep appending messages (delimted by a unique separator [up to maximum length of buffer])... Its been a while since I did any serious C++ coding, and I will be grateful for the help. BTW, this is what I have come up with so far (I am intentionally, not deriving from std::exception because of the performance reasons mentioned in the article, and I am instead, throwing an integral exception (based on an exception enumeration): class fast_exception { public: fast_exception(int what, char const* file=0, int line=0) : what_(what), line_(line), file_(file) {/*empty*/} int what() const { return what_; } int line() const { return line_; } char const* file() const { return file_; } private: int what_; int line_; char const[MAX_BUFFER_SIZE] file_; }

Read the article
push_back of STL list got bad performance?

- by Leon Zhang

I wrote a simple program to test STL list performance against a simple C list-like data structure. It shows bad performance at "push_back()" line. Any comments on it? $ ./test2 Build the type list : time consumed -> 0.311465 Iterate over all items: time consumed -> 0.00898 Build the simple C List: time consumed -> 0.020275 Iterate over all items: time consumed -> 0.008755 The source code is: #include <stdexcept> #include "high_resolution_timer.hpp" #include <list> #include <algorithm> #include <iostream> #define TESTNUM 1000000 /* The test struct */ struct MyType { int num; }; /* * C++ STL::list Test */ typedef struct MyType* mytype_t; void myfunction(mytype_t t) { } int test_stl_list() { std::list<mytype_t> mylist; util::high_resolution_timer t; /* * Build the type list */ t.restart(); for(int i = 0; i < TESTNUM; i++) { mytype_t aItem = (mytype_t) malloc(sizeof(struct MyType)); if(aItem == NULL) { printf("Error: while malloc\n"); return -1; } aItem->num = i; mylist.push_back(aItem); } std::cout << " Build the type list : time consumed -> " << t.elapsed() << std::endl; /* * Iterate over all item */ t.restart(); std::for_each(mylist.begin(), mylist.end(), myfunction); std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl; return 0; } /* * a simple C list */ struct MyCList; struct MyCList{ struct MyType m; struct MyCList* p_next; }; int test_simple_c_list() { struct MyCList* p_list_head = NULL; util::high_resolution_timer t; /* * Build it */ t.restart(); struct MyCList* p_new_item = NULL; for(int i = 0; i < TESTNUM; i++) { p_new_item = (struct MyCList*) malloc(sizeof(struct MyCList)); if(p_new_item == NULL) { printf("ERROR : while malloc\n"); return -1; } p_new_item->m.num = i; p_new_item->p_next = p_list_head; p_list_head = p_new_item; } std::cout << " Build the simple C List: time consumed -> " << t.elapsed() << std::endl; /* * Iterate all items */ t.restart(); p_new_item = p_list_head; while(p_new_item->p_next != NULL) { p_new_item = p_new_item->p_next; } std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl; return 0; } int main(int argc, char** argv) { if(test_stl_list() != 0) { printf("ERROR: error at testcase1\n"); return -1; } if(test_simple_c_list() != 0) { printf("ERROR: error at testcase2\n"); return -1; } return 0; }

Read the article
Linq2SQL vs NHibernate performance (have I gone mad?)

- by HeavyWave

I have written the following tests to compare performance of Linq2SQL and NHibernate and I find results to be somewhat strange. Mappings are straight forward and identical for both. Both are running against a live DB. Although I'm not deleting Campaigns in case of Linq, but that shouldn't affect performance by more than 10 ms. Linq: [Test] public void Test1000ReadsWritesToAgentStateLinqPrecompiled() { Stopwatch sw = new Stopwatch(); Stopwatch swIn = new Stopwatch(); sw.Start(); for (int i = 0; i < 1000; i++) { swIn.Reset(); swIn.Start(); ReadWriteAndDeleteAgentStateWithLinqPrecompiled(); swIn.Stop(); Console.WriteLine("Run ReadWriteAndDeleteAgentState: " + swIn.ElapsedMilliseconds + " ms"); } sw.Stop(); Console.WriteLine("Total Time: " + sw.ElapsedMilliseconds + " ms"); Console.WriteLine("Average time to execute queries: " + sw.ElapsedMilliseconds / 1000 + " ms"); } private static readonly Func<AgentDesktop3DataContext, int, EntityModel.CampaignDetail> GetCampaignById = CompiledQuery.Compile<AgentDesktop3DataContext, int, EntityModel.CampaignDetail>( (ctx, sessionId) => (from cd in ctx.CampaignDetails join a in ctx.AgentCampaigns on cd.CampaignDetailId equals a.CampaignDetailId where a.AgentStateId == sessionId select cd).FirstOrDefault()); private void ReadWriteAndDeleteAgentStateWithLinqPrecompiled() { int id = 0; using (var ctx = new AgentDesktop3DataContext()) { EntityModel.AgentState agentState = new EntityModel.AgentState(); var campaign = new EntityModel.CampaignDetail { CampaignName = "Test" }; var campaignDisposition = new EntityModel.CampaignDisposition { Code = "123" }; campaignDisposition.Description = "abc"; campaign.CampaignDispositions.Add(campaignDisposition); agentState.CallState = 3; campaign.AgentCampaigns.Add(new AgentCampaign { AgentState = agentState }); ctx.CampaignDetails.InsertOnSubmit(campaign); ctx.AgentStates.InsertOnSubmit(agentState); ctx.SubmitChanges(); id = agentState.AgentStateId; } using (var ctx = new AgentDesktop3DataContext()) { var dbAgentState = ctx.GetAgentStateById(id); Assert.IsNotNull(dbAgentState); Assert.AreEqual(dbAgentState.CallState, 3); var campaignDetails = GetCampaignById(ctx, id); Assert.AreEqual(campaignDetails.CampaignDispositions[0].Description, "abc"); } using (var ctx = new AgentDesktop3DataContext()) { ctx.DeleteSessionById(id); } } NHibernate (the loop is the same): private void ReadWriteAndDeleteAgentState() { var id = WriteAgentState().Id; StartNewTransaction(); var dbAgentState = agentStateRepository.Get(id); Assert.IsNotNull(dbAgentState); Assert.AreEqual(dbAgentState.CallState, 3); Assert.AreEqual(dbAgentState.Campaigns[0].Dispositions[0].Description, "abc"); var campaignId = dbAgentState.Campaigns[0].Id; agentStateRepository.Delete(dbAgentState); NHibernateSession.Current.Transaction.Commit(); Cleanup(campaignId); NHibernateSession.Current.BeginTransaction(); } Results: NHibernate: Total Time: 9469 ms Average time to execute 13 queries: 9 ms Linq: Total Time: 127200 ms Average time to execute 13 queries: 127 ms Linq lost by 13.5 times! Event with precompiled queries (both read queries are precompiled). This can't be right, although I expected NHibernate to be faster, this is just too big of a difference, considering mappings are identical and NHibernate actually executes more queries against the DB.

Read the article
UIButton performance in UITableViewCell vs UIView

- by marcel salathe

I'd like to add a UIButton to a custom UITableViewCell (programmatically). This is easy to do, but I'm finding that the "performance" of the button in the cell is slow - that is, when I touch the button, there is quite a bit of delay until the button visually goes into the highlighted state. The same type of button on a regular UIView is very responsive in comparison. In order to isolate the problem, I've created two views - one is a simple UIView, the other is a UITableView with only one UITableViewCell. I've added buttons to both views (the UIView and the UITableViewCell), and the performance difference is quite striking. I've searched the web and read the Apple docs but haven't really found the cause of the problem. My guess is that it somehow has to do with the responder chain, but I can't quite put my finger on it. I must be doing something wrong, and I'd appreciate any help. Thanks. Demo code: ViewController.h #import <UIKit/UIKit.h> @interface ViewController : UIViewController <UITableViewDelegate, UITableViewDataSource> @property UITableView* myTableView; @property UIView* myView; ViewController.m #import "ViewController.h" #import "CustomCell.h" @implementation ViewController @synthesize myTableView, myView; - (id)initWithNibName:(NSString *)nibNameOrNil bundle:(NSBundle *)nibBundleOrNil { self = [super initWithNibName:nibNameOrNil bundle:nibBundleOrNil]; if (self) { [self initMyView]; [self initMyTableView]; } return self; } - (void) initMyView { UIView* newView = [[UIView alloc] initWithFrame:CGRectMake(0,0,[[UIScreen mainScreen] bounds].size.width,100)]; self.myView = newView; // button on regularView UIButton* myButton = [UIButton buttonWithType:UIButtonTypeRoundedRect]; [myButton addTarget:self action:@selector(pressedMyButton) forControlEvents:UIControlEventTouchUpInside]; [myButton setTitle:@"I'm fast" forState:UIControlStateNormal]; [myButton setFrame:CGRectMake(20.0, 10.0, 160.0, 30.0)]; [[self myView] addSubview:myButton]; } - (void) initMyTableView { UITableView *newTableView = [[UITableView alloc] initWithFrame:CGRectMake(0,100,[[UIScreen mainScreen] bounds].size.width,[[UIScreen mainScreen] bounds].size.height-100) style:UITableViewStyleGrouped]; self.myTableView = newTableView; self.myTableView.delegate = self; self.myTableView.dataSource = self; } -(void) pressedMyButton { NSLog(@"pressedMyButton"); } - (void)viewDidLoad { [super viewDidLoad]; [[self view] addSubview:self.myView]; [[self view] addSubview:self.myTableView]; } - (NSInteger)numberOfSectionsInTableView:(UITableView *)tableView { return 1; } - (NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section { return 1; } - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { CustomCell *customCell = [tableView dequeueReusableCellWithIdentifier:@"CustomCell"]; if (customCell == nil) { customCell = [[CustomCell alloc] initWithStyle:UITableViewCellStyleSubtitle reuseIdentifier:@"CustomCell"]; } return customCell; } @end CustomCell.h #import <UIKit/UIKit.h> @interface CustomCell : UITableViewCell @property (retain, nonatomic) UIButton* cellButton; @end CustomCell.m #import "CustomCell.h" @implementation CustomCell @synthesize cellButton; - (id)initWithStyle:(UITableViewCellStyle)style reuseIdentifier:(NSString *)reuseIdentifier { self = [super initWithStyle:style reuseIdentifier:reuseIdentifier]; if (self) { // button within cell cellButton = [UIButton buttonWithType:UIButtonTypeRoundedRect]; [cellButton addTarget:self action:@selector(pressedCellButton) forControlEvents:UIControlEventTouchUpInside]; [cellButton setTitle:@"I'm sluggish" forState:UIControlStateNormal]; [cellButton setFrame:CGRectMake(20.0, 10.0, 160.0, 30.0)]; [self addSubview:cellButton]; } return self; } - (void) pressedCellButton { NSLog(@"pressedCellButton"); } @end

Read the article
What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article
Null-free "maps": Is a callback solution slower than tryGet()?

- by David Moles

In comments to "How to implement List, Set, and Map in null free design?", Steven Sudit and I got into a discussion about using a callback, with handlers for "found" and "not found" situations, vs. a tryGet() method, taking an out parameter and returning a boolean indicating whether the out parameter had been populated. Steven maintained that the callback approach was more complex and almost certain to be slower; I maintained that the complexity was no greater and the performance at worst the same. But code speaks louder than words, so I thought I'd implement both and see what I got. The original question was fairly theoretical with regard to language ("And for argument sake, let's say this language don't even have null") -- I've used Java here because that's what I've got handy. Java doesn't have out parameters, but it doesn't have first-class functions either, so style-wise, it should suck equally for both approaches. (Digression: As far as complexity goes: I like the callback design because it inherently forces the user of the API to handle both cases, whereas the tryGet() design requires callers to perform their own boilerplate conditional check, which they could forget or get wrong. But having now implemented both, I can see why the tryGet() design looks simpler, at least in the short term.) First, the callback example: class CallbackMap<K, V> { private final Map<K, V> backingMap; public CallbackMap(Map<K, V> backingMap) { this.backingMap = backingMap; } void lookup(K key, Callback<K, V> handler) { V val = backingMap.get(key); if (val == null) { handler.handleMissing(key); } else { handler.handleFound(key, val); } } } interface Callback<K, V> { void handleFound(K key, V value); void handleMissing(K key); } class CallbackExample { private final Map<String, String> map; private final List<String> found; private final List<String> missing; private Callback<String, String> handler; public CallbackExample(Map<String, String> map) { this.map = map; found = new ArrayList<String>(map.size()); missing = new ArrayList<String>(map.size()); handler = new Callback<String, String>() { public void handleFound(String key, String value) { found.add(key + ": " + value); } public void handleMissing(String key) { missing.add(key); } }; } void test() { CallbackMap<String, String> cbMap = new CallbackMap<String, String>(map); for (int i = 0, count = map.size(); i < count; i++) { String key = "key" + i; cbMap.lookup(key, handler); } System.out.println(found.size() + " found"); System.out.println(missing.size() + " missing"); } } Now, the tryGet() example -- as best I understand the pattern (and I might well be wrong): class TryGetMap<K, V> { private final Map<K, V> backingMap; public TryGetMap(Map<K, V> backingMap) { this.backingMap = backingMap; } boolean tryGet(K key, OutParameter<V> valueParam) { V val = backingMap.get(key); if (val == null) { return false; } valueParam.value = val; return true; } } class OutParameter<V> { V value; } class TryGetExample { private final Map<String, String> map; private final List<String> found; private final List<String> missing; public TryGetExample(Map<String, String> map) { this.map = map; found = new ArrayList<String>(map.size()); missing = new ArrayList<String>(map.size()); } void test() { TryGetMap<String, String> tgMap = new TryGetMap<String, String>(map); for (int i = 0, count = map.size(); i < count; i++) { String key = "key" + i; OutParameter<String> out = new OutParameter<String>(); if (tgMap.tryGet(key, out)) { found.add(key + ": " + out.value); } else { missing.add(key); } } System.out.println(found.size() + " found"); System.out.println(missing.size() + " missing"); } } And finally, the performance test code: public static void main(String[] args) { int size = 200000; Map<String, String> map = new HashMap<String, String>(); for (int i = 0; i < size; i++) { String val = (i % 5 == 0) ? null : "value" + i; map.put("key" + i, val); } long totalCallback = 0; long totalTryGet = 0; int iterations = 20; for (int i = 0; i < iterations; i++) { { TryGetExample tryGet = new TryGetExample(map); long tryGetStart = System.currentTimeMillis(); tryGet.test(); totalTryGet += (System.currentTimeMillis() - tryGetStart); } System.gc(); { CallbackExample callback = new CallbackExample(map); long callbackStart = System.currentTimeMillis(); callback.test(); totalCallback += (System.currentTimeMillis() - callbackStart); } System.gc(); } System.out.println("Avg. callback: " + (totalCallback / iterations)); System.out.println("Avg. tryGet(): " + (totalTryGet / iterations)); } On my first attempt, I got 50% worse performance for callback than for tryGet(), which really surprised me. But, on a hunch, I added some garbage collection, and the performance penalty vanished. This fits with my instinct, which is that we're basically talking about taking the same number of method calls, conditional checks, etc. and rearranging them. But then, I wrote the code, so I might well have written a suboptimal or subconsicously penalized tryGet() implementation. Thoughts?

Read the article
Preloading Winforms using a Stack and Hidden Form

- by msarchet

I am currently working on a project where we have a couple very control heavy user controls that are being used inside a MDI Controller. This is a Line of Business app and it is very data driven. The problem that we were facing was the aforementioned controls would load very very slowly, we dipped our toes into the waters of multi-threading for the control loading but that was not a solution for a plethora of reasons. Our solution to increasing the performance of the controls ended up being to 'pre-load' the forms onto a hidden window, create a stack of the existing forms, and pop off of the stack as the user requested a form. Now the current issue that I'm seeing that will arise as we push this 'fix' out to our testers, and the ultimately our users is this: Currently the 'hidden' window that contains the preloaded forms is visible in task manager, and can be shut down thus causing all of the controls to be lost. Then you have to create them on the fly losing the performance increase. Secondly, when the user uses up the stack we lose the performance increase (current solution to this is discussed below). For the first problem, is there a way to hide this window from task manager, perhaps by creating a parent form that encapsulates both the main form for the program and the hidden form? Our current solution to the second problem is to have an inactivity timer that when it fires checks the stacks for the forms, and loads a new form onto the stack if it isn't full. However this still has the potential of causing a hang in the UI while it creates the forms. A possible solutions for this would be to put 'used' forms back onto the stack, but I feel like there may be a better way. EDIT: For control design clarification From the comments I have realized there is a lack of clarity on what exactly the control is doing. Here is a detailed explanation of one of the controls. I have defined for this control loading time as the time it takes from when a user performs an action that would open a control, until the time a control is accessible to be edited. The control is for entering Prescriptions for a patient in the system, it has about 5 tabbed groups with a total of about 180 controls. The user selects to open a new Prescription control from inside the main program, this control is loaded into the MDI Child area of the Main Form (which is a DevExpress Ribbon Control). From the time the user clicks New (or loads an existing record) until the control is visible. The list of actions that happens in the program is this: The stack is checked for the existence of a control. If the control exists it is popped off of the stack. The control is rendered on screen. This is what takes 2 seconds The control then is populated with a blank object, or with existing data. The control is ready to use. The average percentage of loading time, across about 10 different machines, with different hardware the control rendering takes about 85 - 95 percent of the control loading time. Without using the stack the control takes about 2 seconds to load, with the stack it takes about .8 seconds, this second time is acceptable. I have looked at Henry's link and I had previously already implemented the applicable suggestions. Again I re-iterate my question as What is the best method to move controls to and from the stack with as little UI interruption as possible?

Read the article
Random Complete System Unresponsiveness Running Mathematical Functions

- by Computer Guru

I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a time (ReadFile), and for each chunk performs a set of mathematical operations (basically calculates the hash). After calculating the hash, it stores info about the chunk in an STL map (basically <chunkID, hash>) and then writes the chunk itself to another file (WriteFile). That's all it does. This program will cause certain PCs to choke and die. The mouse begins to stutter, the task manager takes 2 min to show, ctrl+alt+del is unresponsive, running programs are slow.... the works. I've done literally everything I can think of to optimize the program, and have triple-checked all objects. What I've done: Tried different (less intensive) hashing algorithms. Switched all allocations to nedmalloc instead of the default new operator Switched from stl::map to unordered_set, found the performance to still be abysmal, so I switched again to Google's dense_hash_map. Converted all objects to store pointers to objects instead of the objects themselves. Caching all Read and Write operations. Instead of reading a 16k chunk of the file and performing the math on it, I read 4MB into a buffer and read 16k chunks from there instead. Same for all write operations - they are coalesced into 4MB blocks before being written to disk. Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and perfmon. Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN Set the thread priority to THREAD_PRIORITY_IDLE Added a Sleep(100) call after every loop. Even after all this, the application still results in a system-wide hang on certain machines under certain circumstances. Perfmon and Process Explorer show minimal CPU usage (with the sleep), no constant reads/writes from disk, few hard pagefaults (and only ~30k pagefaults in the lifetime of the application on a 5GB input file), little virtual memory (never more than 150MB), no leaked handles, no memory leaks. The machines I've tested it on run Windows XP - Windows 7, x86 and x64 versions included. None have less than 2GB RAM, though the problem is always exacerbated under lower memory conditions. I'm at a loss as to what to do next. I don't know what's causing it - I'm torn between CPU or Memory as the culprit. CPU because without the sleep and under different thread priorities the system performances changes noticeably. Memory because there's a huge difference in how often the issue occurs when using unordered_set vs Google's dense_hash_map. What's really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior from ever occurring (a user-mode application driving the system to this sort of extreme poor performance!?)..... but when I compile the code and run it on OS X or Linux (it's fairly standard C++ throughout) it performs excellently even on poor machines with little RAM and weaker CPUs. What am I supposed to do next? How do I know what the hell it is that Windows is doing behind the scenes that's killing system performance, when all the indicators are that the application itself isn't doing anything extreme? Any advice would be most welcome.

Read the article
Why don't I just build the whole web app in Javascript and Javascript HTML Templates?

- by viatropos

I'm getting to the point on an app where I need to start caching things, and it got me thinking... In some parts of the app, I render table rows (jqGrid, slickgrid, etc.) or fancy div rows (like in the New Twitter) by grabbing pure JSON and running it through something like Mustache, jquery.tmpl, etc. In other parts of the app, I just render the info in pure HTML (server-side HAML templates), and if there's searching/paginating, I just go to a new URL and load a new HTML page. Now the problem is in caching and maintainability. On one hand I'm thinking, if everything was built using Javascript HTML Templates, then my app would serve just an HTML layout/shell, and a bunch of JSON. If you look at the Facebook and Twitter HTML source, that's basically what they're doing (95% json/javascript, 5% html). This would make it so my app only needed to cache JSON (pages, actions, and/or records). Which means you'd hit the cache no matter if you were some remote api developer accessing a JSON api, or the strait web app. That is, I don't need 2 caches, one for the JSON, one for the HTML. That seems like it'd cut my cache store down in half, and streamline things a little bit. On the other hand, I'm thinking, from what I've seen/experienced, generating static HTML server-side, and caching that, seems to be much better performance wise cross-browser; you get the graphics instantly and don't have to wait that split-second for javascript to render it. StackOverflow seems to do everything in plain HTML, and you can tell... everything appears at once. Notice how though on twitter.com, the page is blank for .5-1 seconds, and the page chunks in: the javascript has to render the json. The downside with this is that, for anything dynamic (like endless scrolling, or grids), I'd have to create javascript templates anyway... so now I have server-side HAML templates, client-side javascript templates, and a lot more to cache. My question is, is there any consensus on how to approach this? What are the benefits and drawbacks from your experience of mixing the two versus going 100% with one over the other? Update: Some reasons that factor into why I haven't yet made the decision to go with 100% javascript templating are: Performance. Haven't formally tested, but from what I've seen, raw html renders faster and more fluidly than javascript-generated html cross-browser. Plus, I'm not sure how mobile devices handle dynamic html performance-wise. Testing. I have a lot of integration tests that work well with static HTML, so switching to javascript-only would require 1) more focused pure-javascript testing (jasmine), and 2) integrating javascript into capybara integration tests. This is just a matter of time and work, but it's probably significant. Maintenance. Getting rid of HAML. I love HAML, it's so easy to write, it prints pretty HTML... It makes code clean, it makes maintenance easy. Going with javascript, there's nothing as concise. SEO. I know google handles the ajax /#!/path, but haven't grasped how this will affect other search engines and how older browsers handle it. Seems like it'd require a significant setup.

Read the article
Improve WPF Rendering Performance (WrapPanel in ItemsControl)

- by Wonko the Sane

Hello All, I have an ItemsSource that appears to have poor performance when adding even a fairly small ObservableCollection to it. The ItemsPanel is a WrapPanel, and the ItemTemplate is essentially a Border containing another Border painted with an ImageBrush. The ItemsControl is wrapped inside a ScrollViewer. After some investigation using WpfPerf, it would appear that most of the "what the heck is it doing?" time is spent on WrapPanel.Measure after creating the collection that is being bound. As I've mentioned, it's a fairly small collection - generally less than 100 items. If nothing else, I'd like to be able to put a "Please Wait" on the screen (during the collection creation portion as well), but I am not sure how to know when the rendering is complete. Any thoughts would be greatly appreciated! Thanks, wTs

Read the article
SQL - .NET - SqlParameters - AddWithValue - Are there any negative performance implications when Par

- by hamlin11

http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlparametercollection.addwithvalue.aspx I'm used to adding sql parameters to a sqlCommand using the add() function. This allows me to specify the type of the sqlParameter, but it requires another line to set the value. It's nice to use the AddWithValue function, but it skips the "specify the parameter type" step. I'm guessing this causes the parameters to be sent over as strings contained within single quotes (''), but I'm not sure. Is this the case, and does this cause significantly slower performance of the stored procedures? Note: I understand that it is nice to validate user data on the .NET side of things by specifying the data type for params -- I'm only concerned about reflection-type overhead of AddWithValue either on the .NET or SQL side.

Read the article
C# WMI, Performance Counters, & SNMP Oh My!

- by Keith

I have a C# windows service which listens to a MSMQ and sends each message out as an email. Since there's no UI, I'd like to offer an ability to monitor this service to see things such as # messages in queue, # emails sent (by message type perhaps), # of errors, etc. What is the best/recommended way to accomplish this? Is it WMI or performance counters? Is this data viewed using PerfMon or WMI CIM Studio? Does any approach allow one to monitor the service real-time as well as providing historical analysis? I can dig into the details myself but would appreciate some broad guidance to help demystify this subject.

Read the article
Need advice on speeding up tableViewCell photo loading performance

- by ambertch

I have around 20 tableview cells that each contain a number (2-5) thumbnail sized pictures (they are VERY small Facebook profile pictures, ex. http://profile.ak.fbcdn.net/hprofile-ak-sf2p/hs254.snc3/23133_201668_2989_q.jpg). Each picture is an UIImageView added to the cell's contentview. Scrolling performance is poor, and measuring the draw time I've found the UIImage rendering is the bottleneck. I've researched/thought of some solutions but as I am new to iphone development I am not sure which strategy to pursue: preload all the images and retrieve them from disk instead of URL when drawing cells (I'm not sure if cell drawing will still be slow, so I want to hold off on the time investment here) Have the cells display a placeholder image from disk, while the picture is asynchronously loaded (this seems to be the best solution, but I'm currently not sure exactly how to do best do this) There's the fast drawing recommendation from Tweetie, but I don't know that will have much affect if it turns out my overhead is in network loading (http://blog.atebits.com/2008/12/fast-scrolling-in-tweetie-with-uitableview/) Thoughts/implementation advice? Thanks!

Read the article
PhysX for massive performance via GPU ?

- by devdude

I recently compared some of the physics engine out there for simulation and game development. Some are free, some are opensource, some are commercial (1 is even very commercial $$$$). Havok, Ode, Newton (aka oxNewton), Bullet, PhysX and "raw" build-in physics in some 3D engines. At some stage I came to conclusion or question: Why should I use anything but NVidia PhysX if I can make use of its amazing performance (if I need it) due to GPU processing ? With future NVidia cards I can expect further improvement independent of the regular CPU generation steps. The SDK is free and it is available for Linux as well. Of course it is a bit of vendor lock-in and it is not opensource. Whats your view or experience ? If you would start right now with development, would you agree with the above ? cheers

Read the article
Improving code and UI Performance

- by Kobojunkie

I am dealing with a situation that I need some help with here. I need to improve performance on functionality that records and updates UI with user selection info. What my code current does is 'This is called to update the Database each time the user makes a new selection on the UI Private Sub OnFilterChanged(String newReviewValueToAdd) AddRecentViewToDB(newReviewValueToAdd) UpdateRecentViewsUI() PageReviewGrid.Rebind()'Call Grid Rebind End Sub 'This is the code that handles updating the UI with the Updated selection Private Sub UpdateRecentViewsUI() Dim rlNode As RadTreeNode = radTree.FindNodeByValue("myreviewnode") Dim Obj As Setting Dim treenode As RadTreeNode For i As Integer = 0 To Count - 1 Obj = Setting.Review.Item(i) treenode = New RadTreeNode(datetime.now.ToString,i.ToString()) treenode.ToolTip = obj.GetFilter radNode1.Nodes.Add(treenode) Next End Sub Private Sub UpdateRecentViewsUI() Dim pnlNav As RadPanelItem = rpbMyLoans.FindItemByValue("rpiMLNavTree") Dim radTree As RadTreeView = CType(pnlNav.FindControl("rtMyLoansNav"), RadTreeView) Dim rlNode As RadTreeNode = radTree.FindNodeByValue("MLRS") rlNode.Nodes.Clear() Dim objRS As SharedCode.WATSUserSettings.MyLoansView Dim objRTN As RadTreeNode For intItem As Integer = 0 To GetUserSettings.MyLoansRecentViews.Count - 1 objRS = GetUserSettings.MyLoansRecentViews.Item(intItem) objRTN = New RadTreeNode(objRS.LastUpdate.ToString, intItem.ToString) objRTN.ToolTip = objRS.getFilterString rlNode.Nodes.Add(objRTN) Next End Sub

Read the article
High-Performance In-Browser Networking

- by Jon Purdy

(Similar in spirit to but different in practice from this question.) Is there any cross-browser-compatible, in-browser technology that allows a high-performance perstistent network connection between a server application and a client written in, say, Javascript? Think XmlHttpRequest on caffeine. I am working on a visualisation system that's restricted to at most a few users at once, and the server is pretty robust, so it can handle as much as it needs to. I would like to allow the client to have access to video streamed from the server at a minimum of about 20 frames per second, regardless of what their graphics hardware capabilities are. Simply put: is this doable without resorting to Flash or Java?

Read the article
AJAX vs AHAH Is there a performance advantage?

- by LanguaFlash

My concern is performance, is there a reason to to send the client XML instead of valid HTML? Like most things, I am sure it is application dependent. My specific situation is where there is substantial content being inserted into the web page that has been pulled from a database. What are the advantages of either approach? Is the size of the content even a concern? Or, in the case of using XML, will the time for the Javascript to process the XML into HTML counterbalance the extra time that would have been required to send HTML to start with? Thanks, Jeff

Read the article
Average performance of binary search algorithm?

- by Passonate Learner

http://en.wikipedia.org/wiki/Binary_search_algorithm#Average_performance BinarySearch(int A[], int value, int low, int high) { int mid; if (high < low) return -1; mid = (low + high) / 2; if (A[mid] > value) return BinarySearch(A, value, low, mid-1); else if (A[mid] < value) return BinarySearch(A, value, mid+1, high); else return mid; } If the integer I'm trying to find is always in the array, can anyone help me write a program that can calculate the average performance of binary search algorithm?

Read the article
Audio -- How much performance improvement can I expect from from reducing function calls by using bu

- by morgancodes

I'm working on an audio-intensive app for the iPhone. I'm currently calling a number of different functions for each sample I need to calculate. For example, I have an envelope class. When I calculate a sample, I do something like: sampleValue = oscilator->tic() * envelope->tic(); But I could also do something like: for(int i = 0; i < bufferLength; i++){ buffer[i] = oscilatorBuffer[i] * evelopeBuffer[i]; } I know the second will be more efficient, but don't know by how much. Are function calls expensive enough that I'd be crazy not to use buffers if I care event a tiny bit about performance?

Read the article
FFMPEG based Theora Video Decoder performance??

- by goldenmean

Hi, I am in process of porting and optimization of the theora video decoder in the ffmpeg-0.5 package to ARM-Cortex-A8 -Neon processor @ 667 MHz. I am looking for some target estimate for frames per second the decoder library alone should achieve after full optimization (C level and Neon assembly / Intrinsics) for 720x480 Progressive content for a 2Mbps stream. I have a Real Video 9 decoder on cortex-A8 which gives around 40 fps for the same stream above.(720x480, 2Mbps) How can i extrapolate this data based on relative complexities of RV9 and Theora and get a fps estimate for theora decoder Cortex-A8? I am aware the performance depends upon the cache configuration of the h/w, etc...,but any Any pointers will help. Thanks, -AD

Read the article
Performance differences between iframe hiding methods?

- by Ender

Is there a major performance difference between the following: <iframe style="visibility:hidden" /> <iframe style="width:0px; height:0px; border:0px" /> I'm using a hidden iframe to pull down and parse some information from an external server. If the iframe actually attempts to render the page, this may suck up a lot of CPU cycles. Of course, I'd ideally just want to get the raw markup - for example, if I could prevent the iframe from loading img tags, that would be perfect.

Read the article
Toubleshooting mapkit performance

- by brettr

I'm plotting over 500 points on a map using mapkit. Zooming is a little jittery compared to the native google map app. I've discovered what is causing the slowness. I'm adding custom annotations so that I can later add different pin colors and buttons for detail views: - (MKAnnotationView *) mapView:(MKMapView *)mapView viewForAnnotation:(AddressNote *) annotation { MKPinAnnotationView *annView=[[MKPinAnnotationView alloc] initWithAnnotation:annotation reuseIdentifier:@"currentlocation"]; annView.pinColor = MKPinAnnotationColorGreen; annView.animatesDrop=TRUE; annView.canShowCallout = YES; annView.calloutOffset = CGPointMake(-5, 5); return annView; } If I comment out the above code, everything works fine. Very smooth with zooming in/out. Should I be adding annotations differently to boost performance?

Read the article
Jquery Widget Performance

- by jamie-wilson

I am working on an interface which involves ALOT of javascript. There is a calendar and blocks drawn on the calendar. The calendar is a jQuery widget, which works beautifully. The blocks drawn on top are also jQuery widgets. While it works - I am wondering, every time I create another block, is the widget fully duplicating, or is it referencing the widget? If I end up with 200 blocks on the screen, do I have 200 copies of the widget? Because if so i'm sure this will impact the performance quite heavily. Also it would determine whether I have functions inside the widget, or have them external to the widget looking in if that makes sense. Just putting some feelers out there for thoughts. I couldn't find anything by searching online.

Read the article
Web Performance testing using VS2010 "Testing a file download"

- by cheedep

Hi All, I am trying out the VS 2010 testing tools for the first time. And I tried recording a web performance test and my actions had a file download implemented as in the KB article here http://support.microsoft.com/kb/812406 by streaming chunks of 10000 bytes. However my test is failing at the download saying "The response stream has been closed". Please help me understand why it is happening this way also any suggestions how you would test such a file download. My main aim was to see how the download was performing for a load test with Intercontinental 350kbps connection on files of about 30-50 MB. Thanks.

Read the article

< Previous Page | 106 107 108 109 110 111 112 113 114 115 116 117 | Next Page >