Search Results

Search found 13403 results on 537 pages for 'epm performance tuning'.

Page 116/537 | < Previous Page | 112 113 114 115 116 117 118 119 120 121 122 123 | Next Page >

Why is the Clojure Hello World program so slow compared to Java and Python?

- by viksit

Hi all, I'm reading "Programming Clojure" and I was comparing some languages I use for some simple code. I noticed that the clojure implementations were the slowest in each case. For instance, Python - hello.py def hello_world(name): print "Hello, %s" % name hello_world("world") and result, $ time python hello.py Hello, world real 0m0.027s user 0m0.013s sys 0m0.014s Java - hello.java import java.io.*; public class hello { public static void hello_world(String name) { System.out.println("Hello, " + name); } public static void main(String[] args) { hello_world("world"); } } and result, $ time java hello Hello, world real 0m0.324s user 0m0.296s sys 0m0.065s and finally, Clojure - hellofun.clj (defn hello-world [username] (println (format "Hello, %s" username))) (hello-world "world") and results, $ time clj hellofun.clj Hello, world real 0m1.418s user 0m1.649s sys 0m0.154s Thats a whole, garangutan 1.4 seconds! Does anyone have pointers on what the cause of this could be? Is Clojure really that slow, or are there JVM tricks et al that need to be used in order to speed up execution? More importantly - isn't this huge difference in performance going to be an issue at some point? (I mean, lets say I was using Clojure for a production system - the gain I get in using lisp seems completely offset by the performance issues I can see here). The machine used here is a 2007 Macbook Pro running Snow Leopard, a 2.16Ghz Intel C2D and 2G DDR2 SDRAM. BTW, the clj script I'm using is from here and looks like, #!/bin/bash JAVA=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin/java CLJ_DIR=/opt/jars CLOJURE=$CLJ_DIR/clojure.jar CONTRIB=$CLJ_DIR/clojure-contrib.jar JLINE=$CLJ_DIR/jline-0.9.94.jar CP=$PWD:$CLOJURE:$JLINE:$CONTRIB # Add extra jars as specified by `.clojure` file if [ -f .clojure ] then CP=$CP:`cat .clojure` fi if [ -z "$1" ]; then $JAVA -server -cp $CP \ jline.ConsoleRunner clojure.lang.Repl else scriptname=$1 $JAVA -server -cp $CP clojure.main $scriptname -- $* fi

Read the article
List of objects or parallel arrays of properties?

- by Headcrab

The question is, basically: what would be more preferable, both performance-wise and design-wise - to have a list of objects of a Python class or to have several lists of numerical properties? I am writing some sort of a scientific simulation which involves a rather large system of interacting particles. For simplicity, let's say we have a set of balls bouncing inside a box so each ball has a number of numerical properties, like x-y-z-coordinates, diameter, mass, velocity vector and so on. How to store the system better? Two major options I can think of are: to make a class "Ball" with those properties and some methods, then store a list of objects of the class, e. g. [b1, b2, b3, ...bn, ...], where for each bn we can access bn.x, bn.y, bn.mass and so on; to make an array of numbers for each property, then for each i-th "ball" we can access it's 'x' coordinate as xs[i], 'y' coordinate as ys[i], 'mass' as masses[i] and so on; To me it seems that the first option represents a better design. The second option looks somewhat uglier, but might be better in terms of performance, and it could be easier to use it with numpy and scipy, which I try to use as much as I can. I am still not sure if Python will be fast enough, so it may be necessary to rewrite it in C++ or something, after initial prototyping in Python. Would the choice of data representation be different for C/C++? What about a hybrid approach, e.g. Python with C++ extension?

Read the article
HttpURLConnection inside a loop

- by Carlos Garces

Hi! I'm trying to connect to one URL that I know that exist but I don't know when. I don't have access to this server so I can't change anything to receive a event. The actual code is this. URL url = new URL(urlName); for(int j = 0 ; j< POOLING && disconnected; j++){ HttpURLConnection connection = (HttpURLConnection) url.openConnection(); int status = connection.getResponseCode(); if(status == HttpURLConnection.HTTP_OK || status == HttpURLConnection.HTTP_NOT_MODIFIED){ //Some work }else{ //wait 3s Thread.sleep(3000); } } Java not is my best skill and I'm not sure if this code is good from the point of view of performance. I'm opening a new connection every 3 seconds? or the connection is reused? If I call to disconnect() I ensure that no new connections are open in the loop, but... it will impact in performance?. Suggestions? What is the fast/best ways to know it a URL exist?

Read the article
How I shoud use BIT in MS SQL 2005

- by adopilot

Regarding to SQL performance. I have Scalar-Valued function for checking some specific condition in base, It returns BIT value True or False, I now do not know how I should fill @BIT parameter If I write. set @bit = convert(bit,1) or set @bit = 1 or set @bit='true' Function will work anyway but I do not know which method is recommended for daily use. Another Question, I have table in my base with around 4 million records, Daily insert is about 4K records in that table. Now I want to add CONSTRAINT on that table whit scalar valued function that I mentioned already Something like this ALTER TABLE fin_stavke ADD CONSTRAINT fin_stavke_knjizenje CHECK ( dbo.fn_ado_chk_fin(id)=convert(bit,1)) Where is filed "id" primary key of table fin_stavke and dbo.fn_ado_chk_fin looks like create FUNCTION fn_ado_chk_fin ( @stavka_id int ) RETURNS bit AS BEGIN declare @bit bit if exists (select * from fin_stavke where id=@stavka_id and doc_id is null and protocol_id is null) begin set @bit=0 end else begin set @bit=1 end return @bit; END GO Will this type and method of cheeking constraint will affect badly performance on my table and SQL at all ? If there is also better way to add control on this table please let me know.

Read the article
Measuring execution time of selected loops

- by user95281

I want to measure the running times of selected loops in a C program so as to see what percentage of the total time for executing the program (on linux) is spent in these loops. I should be able to specify the loops for which the performance should be measured. I have tried out several tools (vtune, hpctoolkit, oprofile) in the last few days and none of them seem to do this. They all find the performance bottlenecks and just show the time for those. Thats because these tools only store the time taken that is above a threshold (~1ms). So if one loop takes lesser time than that then its execution time won't be reported. The basic block counting feature of gprof depends on a feature in older compilers thats not supported now. I could manually write a simple timer using gettimeofday or something like that but for some cases it won't give accurate results. For ex: for (i = 0; i < 1000; ++i) { for (j = 0; j < N; ++j) { //do some work here } } Now here I want to measure the total time spent in the inner loop and I will have to put a call to gettimeofday inside the first loop. So gettimeofday itself will get called a 1000 times which introduces its own overhead and the result will be inaccurate.

Read the article
Oracle: Insertion on an indexed table, avoiding duplicates. Looking for tips and advice.

- by Tom

Hi everyone, Im looking for the best solution (performance wise) to achieve this. I have to insert records into a table, avoiding duplicates. For example, take table A Insert into A ( Select DISTINCT [FIELDS] from B,C,D.. WHERE (JOIN CONDITIONS ON B,C,D..) AND NOT EXISTS ( SELECT * FROM A ATMP WHERE ATMP.SOMEKEY = A.SOMEKEY ) ); I have an index over A.SOMEKEY, just to optimize the NOT EXISTS query, but i realize that inserting on an indexed table will be a performance hit. So I was thinking of duplicating Table A in a Global Temporary Table, where I would keep the index. Then, removing the index from Table A and executing the query, but modified Insert into A ( Select DISTINCT [FIELDS] from B,C,D.. WHERE (JOIN CONDITIONS ON B,C,D..) AND NOT EXISTS ( SELECT * FROM GLOBAL_TEMPORARY_TABLE_A ATMP WHERE ATMP.SOMEKEY = A.SOMEKEY ) ); This would solve the "inserting on an index table", but I would have to update the Global Temporary A with each insertion I make. I'm kind of lost here, Is there a better way to achieve this? Thanks in advance,

Read the article
Downloading javascript Without Blocking

- by doug

The context: My question relates to improving web-page loading performance, and in particular the effect that javascript has on page-loading (resources/elements below the script are blocked from downloading/rendering). This problem is usually avoided/mitigated by placing the scripts at the bottom (eg, just before the tag). The code i am looking at is for web analytics. Placing it at the bottom reduces its accuracy; and because this script has no effect on the page's content, ie, it's not re-writing any part of the page--i want to move it inside the head. Just how to do that without ruining page-loading performance is the crux. From my research, i've found six techniques (w/ support among all or most of the major browsers) for downloading scripts so that they don't block down-page content from loading/rendering: (i) XHR + eval(); (ii) XHR + 'inject'; (iii) download the HTML-wrapped script as in iFrame; (iv) setting the script tag's 'async' flag to 'TRUE' (HTML 5 only); (v) setting the script tag's 'defer' attribute; and (vi) 'Script DOM Element'. It's the last of these i don't understand. The javascript to implement the pattern (vi) is: (function() { var q1 = document.createElement('script'); q1.src = 'http://www.my_site.com/q1.js' document.documentElement.firstChild.appendChild(q1) })(); Seems simple enough: inside this anonymous function, a script element is created, its 'src' element is set to it's location, then the script element is added to the DOM. But while each line is clear, it's still not clear to me how exactly this pattern allows script loading without blocking down-page elements/resources from rendering/loading?

Read the article
How to benchmark on multi-core processors

- by Pascal Cuoq

I am looking for ways to perform micro-benchmarks on multi-core processors. Context: At about the same time desktop processors introduced out-of-order execution that made performance hard to predict, they, perhaps not coincidentally, also introduced special instructions to get very precise timings. Example of these instructions are rdtsc on x86 and rftb on PowerPC. These instructions gave timings that were more precise than could ever be allowed by a system call, allowed programmers to micro-benchmark their hearts out, for better or for worse. On a yet more modern processor with several cores, some of which sleep some of the time, the counters are not synchronized between cores. We are told that rdtsc is no longer safe to use for benchmarking, but I must have been dozing off when we were explained the alternative solutions. Question: Some systems may save and restore the performance counter and provide an API call to read the proper sum. If you know what this call is for any operating system, please let us know in an answer. Some systems may allow to turn off cores, leaving only one running. I know Mac OS X Leopard does when the right Preference Pane is installed from the Developers Tools. Do you think that this make rdtsc safe to use again? More context: Please assume I know what I am doing when trying to do a micro-benchmark. If you are of the opinion that if an optimization's gains cannot be measured by timing the whole application, it's not worth optimizing, I agree with you, but I cannot time the whole application until the alternative data structure is finished, which will take a long time. In fact, if the micro-benchmark were not promising, I could decide to give up on the implementation now; I need figures to provide in a publication whose deadline I have no control over.

Read the article
Converting to a column oriented array in Java

- by halfwarp

Although I have Java in the title, this could be for any OO language. I'd like to know a few new ideas to improve the performance of something I'm trying to do. I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives. Example: List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>(); public void newObject(Object[] obj) { for(int i = 0; i < obj.length; i++) { column-oriented.get(i).add(obj[i]); } } Note: For simplicity I've omitted the initialization of objects and stuff. The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas. How would you do this knowing it's very performance sensitive?

Read the article
What's the best way to store Logon User information for Web Application?

- by Morgan Cheng

I was once in a project of web application developed on ASP.NET. For each logon user, there is an object (let's call it UserSessionObject here) created and stored in RAM. For each HTTP request of given user, matching UserSessoinObject instance is used to visit user state information and connection to database. So, this UserSessionObject is pretty important. This design brings several problems found later: 1) Since this UserSessionObject is cached in ASP.NET memory space, we have to config load balancer to be sticky connection. That is, HTTP request in single session would always be sent to one web server behind. This limit scalability and maintainability. 2) This UserSessionObject is accessed in every HTTP request. To keep the consistency, there is a exclusive lock for the UserSessionObject. Only one HTTP request can be processed at any given time because it must to obtain the lock first. The performance and response time is affected. Now, I'm wondering whether there is better design to handle such logon user case. It seems Sharing-Nothing-Architecture helps. That means long user info is retrieved from database each time. I'm afraid that would hurt performance. Is there any design pattern for long user web app? Thanks.

Read the article
The Current State Of Serving a PHP 5.x App on the Apache, LightTPD & Nginx Web Servers?

- by Gregory Kornblum

Being stuck in a MS stack architecture/development position for the last year and a half has prevented me from staying on top of the world of open source stack based web servers recent evolution more than I would have liked to. However I am now building an open source stack based application/system architecture and sadly I do not have the time to give each of the above mentioned web servers a thorough test of my own to decide. So I figured I'd get input from the best development community site and more specifically the people who make it so. This is a site that is a resource for information regarding a specific domain and target audience with features to help users not only find the information but to also interact with one another in various ways for various reasons. I chose the open source stack for the wealth of resources it has along with much better offers than the MS stack (i.e. WordPress vs BlogEngine.NET). I feel Java is more in the middle of these stacks in this regard although I am not ruling out the possibility of using it in certain areas unrelated to the actual web app itself such as background processes. I have already come to the conclusion of using PHP (using CodeIgniter framework & APC), MySQL (InnoDB) and Memcached on CentOS. I am definitely serving static content on Nginx. However the 3 servers mentioned have no consensus on which is best for dynamic content in regards to performance. It seems LightTPD still has the leak issue which rules it out if it does, Nginx seems it is still not mature enough for this aspect and of course Apache tries to be everything for everybody. I am still going to compile the one chosen with as many performance tweaks as possible such as static linking and the likes. I believe I can get Apache to match the other 2 in regards to serving dynamic content through this process and not having it serve anything static. However during my research it seems the others are still worth considering. So with all things considered I would love to hear what everyone here has to say on the matter. Thanks!

Read the article
Managing StringBuilder Resources

- by Jim Fell

My C# (.NET 2.0) application has a StringBuilder variable with a capacity of 2.5MB. Obviously, I do not want to copy such a large buffer to a larger buffer space every time it fills. By that point, there is so much data in the buffer anyways, removing the older data is a viable option. Can anyone see any obvious problems with how I'm doing this (i.e. am I introducing more performance problems than I'm solving), or does it look okay? tText_c = new StringBuilder(2500000, 2500000); private void AppendToText(string text) { if (tText_c.Length * 100 / tText_c.Capacity > 95) { tText_c.Remove(0, tText_c.Length / 2); } tText_c.Append(text); } EDIT: Additional information: In this application new data is received very rapidly (on the order of milliseconds) through a serial connection. I don't want to populate the multiline textbox with this new information so frequently because that kills the performance of the application, so I'm saving it to a StringBuilder. Every so often, the application copies the contents of the StringBuilder to the textbox and wipes out the StringBuilder contents.

Read the article
Interface Builder vs Cocos 2D - how choice the best for your app.

- by baDa

Hello everyone! I was a flash developer for 3 years, and in the last 5 months, i begin the iphone development, i do 2 applications with interface builder for clients, and now i really want to do a little game, is quite simple, one match 3! I made the engine in interface builder, and seens good to me! But after i read some posts, i really want to try it in the cocos2D! So, in 2 days i rewrite all my first engine for cocos2D, very annoying upsidedown coordinates but ok, i really do! But the performance side by side with interface builder version is really scare! Many Many slow downs at the cocos2d side! And the animation seens bugged to me! I really scare! I really don't know what is the best choice for a simple game. And i want some opinions: Using cocos2d when need some physics? When we have many objects at screen? What is the performance boost i have with cocos2D? I have how to share this 2 applications with you guys?! Without your UID?!

Read the article
Very fast document similarity

- by peyton

Hello, I am trying to determine document similarity between a single document and each of a large number of documents (n ~= 1 million) as quickly as possible. More specifically, the documents I'm comparing are e-mails; they are grouped (i.e., there are folders or tags) and I'd like to determine which group is most appropriate for a new e-mail. Fast performance is critical. My a priori assumption is that the cosine similarity between term vectors is appropriate for this application; please comment on whether this is a good measure to use or not! I have already taken into account the following possibilities for speeding up performance: Pre-normalize all the term vectors Calculate a term vector for each group (n ~= 10,000) rather than each e-mail (n ~= 1,000,000); this would probably be acceptable for my application, but if you can think of a reason not to do it, let me know! I have a few questions: If a new e-mail has a new term never before seen in any of the previous e-mails, does that mean I need to re-compute all of my term vectors? This seems expensive. Is there some clever way to only consider vectors which are likely to be close to the query document? Is there some way to be more frugal about the amount of memory I'm using for all these vectors? Thanks!

Read the article
What is the fastest (to access) struct-like object in Python?

- by DNS

I'm optimizing some code whose main bottleneck is running through and accessing a very large list of struct-like objects. Currently I'm using namedtuples, for readability. But some quick benchmarking using 'timeit' shows that this is really the wrong way to go where performance is a factor: Named tuple with a, b, c: >>> timeit("z = a.c", "from __main__ import a") 0.38655471766332994 Class using __slots__, with a, b, c: >>> timeit("z = b.c", "from __main__ import b") 0.14527461047146062 Dictionary with keys a, b, c: >>> timeit("z = c['c']", "from __main__ import c") 0.11588272541098377 Tuple with three values, using a constant key: >>> timeit("z = d[2]", "from __main__ import d") 0.11106188992948773 List with three values, using a constant key: >>> timeit("z = e[2]", "from __main__ import e") 0.086038238242508669 Tuple with three values, using a local key: >>> timeit("z = d[key]", "from __main__ import d, key") 0.11187358437882722 List with three values, using a local key: >>> timeit("z = e[key]", "from __main__ import e, key") 0.088604143037173344 First of all, is there anything about these little timeit tests that would render them invalid? I ran each several times, to make sure no random system event had thrown them off, and the results were almost identical. It would appear that dictionaries offer the best balance between performance and readability, with classes coming in second. This is unfortunate, since, for my purposes, I also need the object to be sequence-like; hence my choice of namedtuple. Lists are substantially faster, but constant keys are unmaintainable; I'd have to create a bunch of index-constants, i.e. KEY_1 = 1, KEY_2 = 2, etc. which is also not ideal. Am I stuck with these choices, or is there an alternative that I've missed?

Read the article
Overhead of calling tiny functions from a tight inner loop? [C++]

- by John

Say you see a loop like this one: for(int i=0; i<thing.getParent().getObjectModel().getElements(SOME_TYPE).count(); ++i) { thing.getData().insert( thing.GetData().Count(), thing.getParent().getObjectModel().getElements(SOME_TYPE)[i].getName() ); } if this was Java I'd probably not think twice. But in performance-critical sections of C++, it makes me want to tinker with it... however I don't know if the compiler is smart enough to make it futile. This is a made up example but all it's doing is inserting strings into a container. Please don't assume any of these are STL types, think in general terms about the following: Is having a messy condition in the for loop going to get evaluated each time, or only once? If those get methods are simply returning references to member variables on the objects, will they be inlined away? Would you expect custom [] operators to get optimized at all? In other words is it worth the time (in performance only, not readability) to convert it to something like: ElementContainer &source = thing.getParent().getObjectModel().getElements(SOME_TYPE); int num = source.count(); Store &destination = thing.getData(); for(int i=0;i<num;++i) { destination.insert(thing.GetData().Count(), source[i].getName(); } Remember, this is a tight loop, called millions of times a second. What I wonder is if all this will shave a couple of cycles per loop or something more substantial? Yes I know the quote about "premature optimisation". And I know that profiling is important. But this is a more general question about modern compilers, Visual Studio in particular.

Read the article
Oracle Query Optimization: Why is My Second Query Faster?

- by Patrick Cuff

I was having some performance issues with an Oracle query, so I downloaded a trial of the Quest SQL Optimizer for Oracle, which made some changes that dramatically improved the query's performance. I'm not exactly sure why the recommended query had such an improvement; can anyone provide an explanation? Before: SELECT t1.version_id, t1.id, t2.field1, t3.person_id, t2.id FROM table1 t1, table2 t2, table3 t3 WHERE t1.id = t2.id AND t1.version_id = t2.version_id AND t2.id = 123 AND t1.version_id = t3.version_id AND t1.VERSION_NAME <> 'AA' order by t1.id Plan Cost: 831 Elapsed Time: 00:00:21.40 Number of Records: 40,717 After: SELECT /*+ USE_NL_WITH_INDEX(t1) */ t1.version_id, t1.id, t2.field1, t3.person_id, t2.id FROM table2 t2, table3 t3, table1 t1 WHERE t1.id = t2.id + 0 AND t1.version_id = t2.version_id + 0 AND t2.id = 123 AND t1.version_id = t3.version_id + 0 AND t1.VERSION_NAME || '' <> 'AA' AND t3.version_id = t2.version_id + 0 order by t1.id Plan Cost: 686 Elapsed Time: 00:00:00.95 Number of Records: 40,717 Questions: Why does re-arranging the order of the tables in the FROM clause help? Why does adding + 0 to the WHERE clause comparisons help? Why does || '' <> 'AA' in the WHERE clause VERSION_NAME comparison help? Is this a more efficient way of handling possible nulls on this column?

Read the article
Sync Vs. Async Sockets Performance in C#

- by Michael Covelli

Everything that I read about sockets in .NET says that the asynchronous pattern gives better performance (especially with the new SocketAsyncEventArgs which saves on the allocation). I think this makes sense if we're talking about a server with many client connections where its not possible to allocate one thread per connection. Then I can see the advantage of using the ThreadPool threads and getting async callbacks on them. But in my app, I'm the client and I just need to listen to one server sending market tick data over one tcp connection. Right now, I create a single thread, set the priority to Highest, and call Socket.Receive() with it. My thread blocks on this call and wakes up once new data arrives. If I were to switch this to an async pattern so that I get a callback when there's new data, I see two issues The threadpool threads will have default priority so it seems they will be strictly worse than my own thread which has Highest priority. I'll still have to send everything through a single thread at some point. Say that I get N callbacks at almost the same time on N different threadpool threads notifying me that there's new data. The N byte arrays that they deliver can't be processed on the threadpool threads because there's no guarantee that they represent N unique market data messages because TCP is stream based. I'll have to lock and put the bytes into an array anyway and signal some other thread that can process what's in the array. So I'm not sure what having N threadpool threads is buying me. Am I thinking about this wrong? Is there a reason to use the Async patter in my specific case of one client connected to one server?

Read the article
WPF animation/UI features performance and benchmarking

- by Rich

I'm working on a relatively small proof-of-concept for some line of business stuff with some fancy WPF UI work. Without even going too crazy, I'm already seeing some really poor performance when using a lot of the features that I thought were the main reason to consider WPF for UI building in the first place. I asked a question on here about why my animation was being stalled the first time it was run, and at the end what I found was that a very simple UserControl was taking almost half a second just to build its visual tree. I was able to get a work around to the symptom, but the fact that it takes that long to initialize a simple control really bothers me. Now, I'm testing my animation with and without the DropShadowEffect, and the result is night and day. A subtle drop shadow makes my control look so much nicer, but it completely ruins the smoothness of the animation. Let me not even start with the font rendering either. The calculation of my animations when the control has a bunch of gradient brushes and a drop shadow make the text blurry for about a full second and then slowly come into focus. So, I guess my question is if there are known studies, blog posts, or articles detailing which features are a hazard in the current version of WPF for business critical applications. Are things like Effects (ie. DropShadowEffect), gradient brushes, key frame animations, etc going to have too much of a negative effect on render quality (or maybe the combinations of these things)? Is the final version of WPF 4.0 going to correct some of these issues? I've read that VS2010 beta has some of these same issues and that they are supposed to be resolved by final release. Is that because of improvements to WPF itself or because half of the application will be rebuilt with the previous technology?

Read the article
Compiler optimization causing the performance to slow down

- by aJ

I have one strange problem. I have following piece of code: template<clss index, class policy> inline int CBase<index,policy>::func(const A& test_in, int* srcPtr ,int* dstPtr) { int width = test_in.width(); int height = test_in.height(); double d = 0.0; //here is the problem for(int y = 0; y < height; y++) { //Pointer initializations //multiplication involving y //ex: int z = someBigNumber*y + someOtherBigNumber; for(int x = 0; x < width; x++) { //multiplication involving x //ex: int z = someBigNumber*x + someOtherBigNumber; if(soemCondition) { // floating point calculations } *dstPtr++ = array[*srcPtr++]; } } } The inner loop gets executed nearly 200,000 times and the entire function takes 100 ms for completion. ( profiled using AQTimer) I found an unused variable double d = 0.0; outside the outer loop and removed the same. After this change, suddenly the method is taking 500ms for the same number of executions. ( 5 times slower). This behavior is reproducible in different machines with different processor types. (Core2, dualcore processors). I am using VC6 compiler with optimization level O2. Follwing are the other compiler options used : -MD -O2 -Z7 -GR -GX -G5 -X -GF -EHa I suspected compiler optimizations and removed the compiler optimization /O2. After that function became normal and it is taking 100ms as old code. Could anyone throw some light on this strange behavior? Why compiler optimization should slow down performance when I remove unused variable ? Note: The assembly code (before and after the change) looked same.

Read the article
Sync Vs. Async Sockets Performance in .NET

- by Michael Covelli

Everything that I read about sockets in .NET says that the asynchronous pattern gives better performance (especially with the new SocketAsyncEventArgs which saves on the allocation). I think this makes sense if we're talking about a server with many client connections where its not possible to allocate one thread per connection. Then I can see the advantage of using the ThreadPool threads and getting async callbacks on them. But in my app, I'm the client and I just need to listen to one server sending market tick data over one tcp connection. Right now, I create a single thread, set the priority to Highest, and call Socket.Receive() with it. My thread blocks on this call and wakes up once new data arrives. If I were to switch this to an async pattern so that I get a callback when there's new data, I see two issues The threadpool threads will have default priority so it seems they will be strictly worse than my own thread which has Highest priority. I'll still have to send everything through a single thread at some point. Say that I get N callbacks at almost the same time on N different threadpool threads notifying me that there's new data. The N byte arrays that they deliver can't be processed on the threadpool threads because there's no guarantee that they represent N unique market data messages because TCP is stream based. I'll have to lock and put the bytes into an array anyway and signal some other thread that can process what's in the array. So I'm not sure what having N threadpool threads is buying me. Am I thinking about this wrong? Is there a reason to use the Async patter in my specific case of one client connected to one server?

Read the article
Figuring out the performance limitation of an ADC on a PIC microcontroller

- by AKE

I'm spec-ing the suitability of a microcontroller like PIC for an analog-to-digital application. This would be preferable to using external A/D chips. To do that, I've had to run through some computations, pulling the relevant parameters from the datasheets. I'm not sure I've got it right -- would appreciate a check! Here's the simplest example: PIC10F220 is the simplest possible PIC with an ADC. Runs at clock speed of 8MHz. Has an instruction cycle of 0.5us (4 clock steps per instruction) So: Taking Tacq = 6.06 us (acquisition time for ADC, assume chip temp. = 50*C) [datasheet p34] Taking Fosc = 8MHz (? clock speed) Taking divisor = 4 (4 clock steps per CPU instruction) This gives TAD = 0.5us (TAD = 1/(Fosc/divisor) ) Conversion time is 13*TAD [datasheet p31] This gives conversion time 6.5us ADC duration is then 12.56 us [? Tacq + 13*TAD] Assuming at least 2 instructions for load/store: This is another 1 us [0.5 us per instruction] Which would give max sampling rate of 73.7 ksps (1/13.56) Supposing 8 more instructions for real-time processing: This is another 4 us Thus, total ADC/handling time = 17.56us (12.56us + 1us + 4us) So expected upper sampling rate is 56.9 ksps. Nyquist frequency for this sampling rate is therefore 28 kHz. If this is right, it suggests the (theoretical) performance suitability of this chip's A/D is for signals that are bandlimited to 28 kHz. Is this a correct interpretation of the information given in the data sheet? Any pointers would be much appreciated! AKE

Read the article
Average performance of binary search algorithm?

- by Passionate Learner

http://en.wikipedia.org/wiki/Binary_search_algorithm#Average_performance BinarySearch(int A[], int value, int low, int high) { int mid; if (high < low) return -1; mid = (low + high) / 2; if (A[mid] > value) return BinarySearch(A, value, low, mid-1); else if (A[mid] < value) return BinarySearch(A, value, mid+1, high); else return mid; } If the integer I'm trying to find is always in the array, can anyone help me write a program that can calculate the average performance of binary search algorithm? I know I can do this by actually running the program and counting the number of calls, but what I'm trying to do here is to do it without calling the function. I'm not asking for a time complexity, I'm trying to calculate the average number of calls. For example, the average number of calls to find a integer in A[2], it would be 1.67 (5/3).

Read the article
Aggregate Pattern and Performance Issues

- by Mosh

Hello, I have read about the Aggregate Pattern but I'm confused about something here. The pattern states that all the objects belonging to the aggregate should be accessed via the Aggregate Root, and not directly. And I'm assuming that is the reason why they say you should have a single Repository per Aggregate. But I think this adds a noticeable overhead to the application. For example, in a typical Web-based application, what if I want to get an object belonging to an aggregate (which is NOT the aggregate root)? I'll have to call Repository.GetAggregateRootObject(), which loads the aggregate root and all its child objects, and then iterate through the child objects to find the one I'm looking for. In other words, I'm loading lots of data and throwing them out except the particular object I'm looking for. Is there something I'm missing here? PS: I know some of you may suggest that we can improve performance with Lazy Loading. But that's not what I'm asking here... The aggregate pattern requires that all objects belonging to the aggregate be loaded together, so we can enforce business rules.

Read the article
performance of large number calculations in python (python 2.7.3 and .net 4.0)

- by g36

There is a lot of general questions about python performance in comparison to other languages. I've got more specific example: There are two simple functions wrote in python an c#, both checking if int number is prime. python: import time def is_prime(n): num =n/2 while num >1: if n % num ==0: return 0 num-=1 return 1 start = time.clock() probably_prime = is_prime(2147483629) elapsed = (time.clock() - start) print 'time : '+str(elapsed) and C#: using System.Diagnostics; public static bool IsPrime(int n) { int num = n/2; while(num >1) { if(n%num ==0) { return false; } num-=1; } return true; } Stopwatch sw = new Stopwatch(); sw.Start(); bool result = Functions.IsPrime(2147483629); sw.Stop(); Console.WriteLine("time: {0}", sw.Elapsed); And times ( which are surprise for me as a begginer in python:)): Python: 121s; c#: 6s Could You explain where does this big diffrence come from ?

Read the article

< Previous Page | 112 113 114 115 116 117 118 119 120 121 122 123 | Next Page >