Search Results

Search found 562 results on 23 pages for 'profiling'.

Page 18/23 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 | Next Page >

C++ Accelerated Massive Parallelism

- by Daniel Moth

At AMD's Fusion conference Herb Sutter announced in his keynote session a technology that our team has been working on that we call C++ Accelerated Massive Parallelism (C++ AMP) and during the keynote I showed a brief demo of an app built with our technology. After the keynote, I go deeper into the technology in my breakout session. If you read both those abstracts, you'll get some information about what C++ AMP is, without being too explicit since we published the abstracts before the technology was announced. You can find the official online announcement at Soma's blog post. Here, I just wanted to capture the key points about C++ AMP that can serve as an introduction and an FAQ. So, in no particular order… C++ AMP lowers the barrier to entry for heterogeneous hardware programmability and brings performance to the mainstream, without sacrificing developer productivity or solution portability. is designed not only to help you address today's massively parallel hardware (i.e. GPUs and APUs), but it also future proofs your code investments with a forward looking design. is part of Visual C++. You don't need to use a different compiler or learn different syntax. is modern C++. Not C or some other derivative. is integrated and supported fully in Visual Studio vNext. Editing, building, debugging, profiling and all the other goodness of Visual Studio work well with C++ AMP. provides an STL-like library as part of the existing concurrency namespace and delivered in the new amp.h header file. makes it extremely easy to work with large multi-dimensional data on heterogeneous hardware; in a manner that exposes parallelization. introduces only one core C++ language extension. builds on DirectX (and DirectCompute in particular) which offers a great hardware abstraction layer that is ubiquitous and reliable. The architecture is such, that this point can be thought of as an implementation detail that does not surface to the API layer. Stay tuned on my blog for more over the coming months where I will switch from just talking about C++ AMP to showing you how to use the API with code examples… Comments about this post welcome at the original blog.

Read the article
Fraud and Anomaly Detection using Oracle Data Mining YouTube-like Video

- by chberger

I've created and recorded another YouTube-like presentation and "live" demos of Oracle Advanced Analytics Option, this time focusing on Fraud and Anomaly Detection using Oracle Data Mining. [Note: It is a large MP4 file that will open and play in place. The sound quality is weak so you may need to turn up the volume.] Data is your most valuable asset. It represents the entire history of your organization and its interactions with your customers. Predictive analytics leverages data to discover patterns, relationships and to help you even make informed predictions. Oracle Data Mining (ODM) automatically discovers relationships hidden in data. Predictive models and insights discovered with ODM address business problems such as: predicting customer behavior, detecting fraud, analyzing market baskets, profiling and loyalty. Oracle Data Mining, part of the Oracle Advanced Analytics (OAA) Option to the Oracle Database EE, embeds 12 high performance data mining algorithms in the SQL kernel of the Oracle Database. This eliminates data movement, delivers scalability and maintains security. But, how do you find these very important needles or possibly fraudulent transactions and huge haystacks of data? Oracle Data Mining’s 1 Class Support Vector Machine algorithm is specifically designed to identify rare or anomalous records. Oracle Data Mining's 1-Class SVM anomaly detection algorithm trains on what it believes to be considered “normal” records, build a descriptive and predictive model which can then be used to flags records that, on a multi-dimensional basis, appear to not fit in--or be different. Combined with clustering techniques to sort transactions into more homogeneous sub-populations for more focused anomaly detection analysis and Oracle Business Intelligence, Enterprise Applications and/or real-time environments to "deploy" fraud detection, Oracle Data Mining delivers a powerful advanced analytical platform for solving important problems. With OAA/ODM you can find suspicious expense report submissions, flag non-compliant tax submissions, fight fraud in healthcare claims and save huge amounts of money in fraudulent claims and abuse. This presentation and several brief demos will show Oracle Data Mining's fraud and anomaly detection capabilities.

Read the article
Compiling for T4

- by Darryl Gove

I've recently had quite a few queries about compiling for T4 based systems. So it's probably a good time to review what I consider to be the best practices. Always use the latest compiler. Being in the compiler team, this is bound to be something I'd recommend But the serious points are that (a) Every release the tools get better and better, so you are going to be much more effective using the latest release (b) Every release we improve the generated code, so you will see things get better (c) Old releases cannot know about new hardware. Always use optimisation. You should use at least -O to get some amount of optimisation. -xO4 is typically even better as this will add within-file inlining. Always generate debug information, using -g. This allows the tools to attribute information to lines of source. This is particularly important when profiling an application. The default target of -xtarget=generic is often sufficient. This setting is designed to produce a binary that runs well across all supported platforms. If the binary is going to be deployed on only a subset of architectures, then it is possible to produce a binary that only uses the instructions supported on these architectures, which may lead to some performance gains. I've previously discussed which chips support which architectures, and I'd recommend that you take a look at the chart that goes with the discussion. Crossfile optimisation (-xipo) can be very useful - particularly when the hot source code is distributed across multiple source files. If you're allowed to have something as geeky as favourite compiler optimisations, then this is mine! Profile feedback (-xprofile=[collect: | use:]) will help the compiler make the best code layout decisions, and is particularly effective with crossfile optimisations. But what makes this optimisation really useful is that codes that are dominated by branch instructions don't typically improve much with "traditional" compiler optimisation, but often do respond well to being built with profile feedback. The macro flag -fast aims to provide a one-stop "give me a fast application" flag. This usually gives a best performing binary, but with a few caveats. It assumes the build platform is also the deployment platform, it enables floating point optimisations, and it makes some relatively weak assumptions about pointer aliasing. It's worth investigating. SPARC64 processor, T3, and T4 implement floating point multiply accumulate instructions. These can substantially improve floating point performance. To generate them the compiler needs the flag -fma=fused and also needs an architecture that supports the instruction (at least -xarch=sparcfmaf). The most critical advise is that anyone doing performance work should profile their application. I cannot overstate how important it is to look at where the time is going in order to determine what can be done to improve it. I also presented at Oracle OpenWorld on this topic, so it might be helpful to review those slides.

Read the article
How granular should a command be in a CQ[R]S model?

- by Aaronaught

I'm considering a project to migrate part of our WCF-based SOA over to a service bus model (probably nServiceBus) and using some basic pub-sub to achieve Command-Query Separation. I'm not new to SOA, or even to service bus models, but I confess that until recently my concept of "separation" was limited to run-of-the-mill database mirroring and replication. Still, I'm attracted to the idea because it seems to provide all the benefits of an eventually-consistent system while sidestepping many of the obvious drawbacks (most notably the lack of proper transactional support). I've read a lot on the subject from Udi Dahan who is basically the guru on ESB architectures (at least in the Microsoft world), but one thing he says really puzzles me: As we get larger entities with more fields on them, we also get more actors working with those same entities, and the higher the likelihood that something will touch some attribute of them at any given time, increasing the number of concurrency conflicts. [...] A core element of CQRS is rethinking the design of the user interface to enable us to capture our users’ intent such that making a customer preferred is a different unit of work for the user than indicating that the customer has moved or that they’ve gotten married. Using an Excel-like UI for data changes doesn’t capture intent, as we saw above. -- Udi Dahan, Clarified CQRS From the perspective described in the quotation, it's hard to argue with that logic. But it seems to go against the grain with respect to SOAs. An SOA (and really services in general) are supposed to deal with coarse-grained messages so as to minimize network chatter - among many other benefits. I realize that network chatter is less of an issue when you've got highly-distributed systems with good message queuing and none of the baggage of RPC, but it doesn't seem wise to dismiss the issue entirely. Udi almost seems to be saying that every attribute change (i.e. field update) ought to be its own command, which is hard to imagine in the context of one user potentially updating hundreds or thousands of combined entities and attributes as it often is with a traditional web service. One batch update in SQL Server may take a fraction of a second given a good highly-parameterized query, table-valued parameter or bulk insert to a staging table; processing all of these updates one at a time is slow, slow, slow, and OLTP database hardware is the most expensive of all to scale up/out. Is there some way to reconcile these competing concerns? Am I thinking about it the wrong way? Does this problem have a well-known solution in the CQS/ESB world? If not, then how does one decide what the "right level" of granularity in a Command should be? Is there some "standard" one can use as a starting point - sort of like 3NF in databases - and only deviate when careful profiling suggests a potentially significant performance benefit? Or is this possibly one of those things that, despite several strong opinions being expressed by various experts, is really just a matter of opinion?

Read the article
extreme slowness with a remote database in Drupal

- by ceejayoz

We're attempting to scale our Drupal installations up and have decided on some dedicated MySQL boxes. Unfortunately, we're running into extreme slowness when we attempt to use the remote DB - page load times go from ~200 milliseconds to 5-10 seconds. Latency between the servers is minimal - a tenth or two of a millisecond. PING 10.37.66.175 (10.37.66.175) 56(84) bytes of data. 64 bytes from 10.37.66.175: icmp_seq=1 ttl=64 time=0.145 ms 64 bytes from 10.37.66.175: icmp_seq=2 ttl=64 time=0.157 ms 64 bytes from 10.37.66.175: icmp_seq=3 ttl=64 time=0.157 ms 64 bytes from 10.37.66.175: icmp_seq=4 ttl=64 time=0.144 ms 64 bytes from 10.37.66.175: icmp_seq=5 ttl=64 time=0.121 ms 64 bytes from 10.37.66.175: icmp_seq=6 ttl=64 time=0.122 ms 64 bytes from 10.37.66.175: icmp_seq=7 ttl=64 time=0.163 ms 64 bytes from 10.37.66.175: icmp_seq=8 ttl=64 time=0.115 ms 64 bytes from 10.37.66.175: icmp_seq=9 ttl=64 time=0.484 ms 64 bytes from 10.37.66.175: icmp_seq=10 ttl=64 time=0.156 ms --- 10.37.66.175 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 8998ms rtt min/avg/max/mdev = 0.115/0.176/0.484/0.104 ms Drupal's devel.module timers show the database queries aren't running any slower on the remote DB - about 150 microseconds whether it's the local or the remote server. Profiling with XHProf shows PHP execution times that aren't out of whack, either. Number of queries doesn't seem to make a difference - we seem the same 5-10 second delay whether a page has 12 queries or 250. Any suggestions about where I should start troubleshooting here? I'm quite confused.

Read the article
Performance data collection for short-running, ephemeral servers

- by ErikA

We're building a medical image processing software stack, currently hosted on various AWS resources. As part of this application, we have a handful of long-running servers (database, load balancers, web application, etc.). Collecting performance data on those servers is quite simple - my go-to- recipe of Nagios (for monitoring/notifications) and Munin (for collection of performance data and displaying trends) will work just fine. However - as part of this application, we are constantly starting up and terminating compute instances on EC2. In typical usage, these compute instances start up, configure themselves, receive a job from a message queue, and then get to work processing that job, which takes anywhere from 15 minutes to over 8 hours. After job completion, these instances get terminated, never to be heard from again. What is a decent strategy for collecting performance data on these short-lived instances? I don't necessarily need monitoring on them - if they fail for whatever reason, our application will detect this and handle re-starting the job on another instance or raising the flag so an administrator can take a look at things. However, it still would be useful to collect information like CPU (user, idle, iowait, etc.), memory usage, network traffic, disk read/write data, etc. In our internal database, we track the instance ID of the machine that runs each job, and it would be quite helpful to be able to look up performance data for a specific instance ID for troubleshooting and profiling. Munin doesn't seem like a great candidate, as it requires maintaining a list of munin nodes in a text file - far from ideal for an environment with a high amount of churn, and for the short amount of time each node will be running, I'd rather keep the full-resolution data indefinitely than have RRD water down the data over time. In the end, my guess is that this will require a monitoring engine that: uses a database (MySQL, SQLite, etc.) for configuration and data storage exposes an API for adding/removing hosts and services Are there other things I should be thinking about when evaluating options? Perhaps I'm over-thinking this, though, and just ought to run sar at 1-minute intervals on these short-lived instances and collect the sar db files prior to termination.

Read the article
Why is my server performance degrading to the point of stopping, periodically?

- by Pascal Aschwanden

So, once in a while, I see in firebug that a request takes over 15 or even 60 seconds to respond and sometimes never. Here is what I've ruled out: It's not the CPU, cuz every time I check the Server load its less then 6 for all 3 numbers It's not the memory, because thats fairly low too, less the 50% It's not the I/O anymore, because I've seen the graphs that Joyent sent back to me when I requested them, and they show less then 3MB of I/O (mostly all read). It's not the SQL performance - I've profiled every last SQL command that runs, and they're all (99.9% of them anyway) running in less then 30ms, most run in less then 5ms. Oh and I've been profiling all the script execution times, and even the when the problem occurs, the script always manages to finish in 50ms or less (that's 1 / 20th of a second ). Now, I do run alot of ajax calls. 1 every 2 seconds per user and I have 300 DAU+. But, even if all 300 are playing simultaneously, thats still only 150 calls per second max. The only other thing I can think of is that one of my neighbors is funky. The problem is highly intermittent. 99% of the time it works perfectly and there's excellent performance. but 99%+ is not good enough. Eventually the performance gets so bad I have to restart the server, at which point everything is fine again. I've done this about 4 times now. Any ideas? Note: this is on joyent, vps, intro package 256mb of ram with bursting. here are the mysql dump info: Traffic ø per hour Received 18 MiB 29 MiB Sent 134 MiB 221 MiB Total 151 MiB 251 MiB Connections ø per hour % max. concurrent connections 5 --- --- Failed attempts 0 0.00 0.00% Aborted 0 0.00 0.00% Total 9,418 15.59 k 100.00%

Read the article
Benchmark Linq2SQL, Subsonic2, Subsonic3 - Any other ideas to make them faster ?

- by Aristos

I am working with Subsonic 2 more than 3 years now... After Linq appears and then Subsonic 3, I start thinking about moving to the new Linq futures that are connected to sql. I must say that I start move and port my subsonic 2 with SubSonic 3, and very soon I discover that the speed was so slow thats I didn't believe it - and starts all that tests. Then I test Linq2Sql and see also a delay - compare it with Subsonic 2. My question here is, especial for the linq2sql, and the up-coming dotnet version 4, what else can I do to speed it up ? What else on linq2sql settings, or classes, not on this code that I have used for my messures I place here the project that I make the tests, also the screen shots of the results. How I make the tests - and the accurate of my measures. I use only for my question Google chrome, because its difficult for me to show here a lot of other measures that I have done with more complex programs. This is the most simple one, I just measure the Data Read. How can I prove that. I make a simple Thread.Sleep(10 seconds) and see if I see that 10 seconds on Google Chrome Measure, and yes I see it. here are more test with this Sleep thead to see whats actually Chrome gives. 10 seconds delay 100 ms delay Zero delay There is only a small 15ms thats get on messure, is so small compare it with the rest of my tests that I do not care about. So what I measure I measure just the data read via each method - did not count the data or database delay, or any disk read or anything like that. Later on the image with the result I show that no disk activity exist on the measures See this image to see what really I measure and if this is correct Why I chose this kind of test Its simple, it's real, and it's near my real problem that I found the delay of subsonic 3 in real program with real data. Now lets tests the dals Start by see this image I have 4-5 calls on every method, the one after the other. The results are. For a loop of 100 times, ask for 5 Rows, one not exist, approximatively.. Simple adonet:81ms SubSonic 2 :210ms linq2sql :1.70sec linq2sql using CompiledQuery.Compile :239ms Subsonic 3 :15.00sec (wow - extreme slow) The project http://www.planethost.gr/DalSpeedTests.rar Can any one confirm this benchmark, or make any optimizations to help me out ? Other tests Some one publish here this link http://ormbattle.net/ (and then remove it - don not know why) In this page you can find a really useful advanced tests for all, except subsonic 2 and subsonic 3 that I have here ! Optimizing What I really ask here is if some one can now any trick how to optimize the DALs, not by changing the test code, but by changing the code and the settings on each dal. For example... Optimizing Linq2SQL I start search how to optimize Linq2sql and found this article, and maybe more exist. Finally I make the tricks from that page to run, and optimize the code using them all. The speed was near 1.50sec from 1.70.... big improvement, but still slow. Then I found a different way - same idea article, and wow ! the speed is blow up. Using this trick with CompiledQuery.Compile, the time from 1.5sec is now 239ms. Here is the code for the precompiled... Func<DataClassesDataContext, int, IQueryable<Product>> compiledQuery = CompiledQuery.Compile((DataClassesDataContext meta, int IdToFind) => (from myData in meta.Products where myData.ProductID.Equals(IdToFind) select myData)); StringBuilder Test = new StringBuilder(); int[] MiaSeira = { 5, 6, 10, 100, 7 }; using (DataClassesDataContext context = new DataClassesDataContext()) { context.ObjectTrackingEnabled = false; for (int i = 0; i < 100; i++) { foreach (int EnaID in MiaSeira) { var oFindThat2P = compiledQuery(context, EnaID); foreach (Product One in oFindThat2P) { Test.Append("<br />"); Test.Append(One.ProductName); } } } } Optimizing SubSonic 3 and problems I make many performance profiling, and start change the one after the other and the speed is better but still too slow. I post them on subsonic group but they ignore the problem, they say that everything is fast... Here is some capture of my profiling and delay points inside subsonic source code I have end up that subsonic3 make more call on the structure of the database rather than on data itself. Needs to reconsider the hole way of asking for data, and follow the subsonic2 idea if this is possible. Try to make precompile to subsonic 3 like I did in linq2Sql but fail for the moment... Optimizing SubSonic 2 After I discover that subsonic 3 is extreme slow, I start my checks on subsonic 2 - that I have never done before believing that is fast. (and it is) So its come up with some points that can be faster. For example there are many loops like this ones that actually is slow because of string manipulation and compares inside the loop. I must say to you that this code called million of times ! on a period of few minutes ! of data asking from the program. On small amount of tables and small fields maybe this is not a big think for some people, but on large amount of tables, the delay is even more. So I decide and optimize the subsonic 2 by my self, by replacing the string compares, with number compares! Simple. I do that almost on every point that profiler say that is slow. I change also all small points that can be even a little faster, and disable some not so used thinks. The results, 5% faster on NorthWind database, near 20% faster on my database with 250 tables. That is count with 500ms less in 10 seconds process on northwind, 100ms faster on my database on 500ms process time. I do not have captures to show you for that because I have made them with different code, different time, and track them down on paper. Anyway this is my story and my question on all that, what else do you know to make them even faster... For this measures I have use Subsonic 2.2 optimized by me, Subsonic 3.0.0.3 a little optimized by me, and Dot.Net 3.5

Read the article
WinForms ReportViewer: slow initial rendering

- by Bryan Roth

UPDATE 2.4.2010 Yeah, this is an old question but I thought I would give an update. So, I'm working with the ReportViewer again and it's still rendering slowly on the initial load. The only difference is that the SQL database is on the reporting server. UPDATE 3.16.2009 I have done profiling and it's not the SQL that is making the ReportViewer render slowly on the first call. On the first call, the ReportViewer control locks up the UI thread and makes the program unresponsive. After about 5 seconds the ReportViewer will unlock the UI thread and display "Report is being generated" and then finally show the report. I know 5 seconds is not much but this shouldn't be happening. My coworker does the same thing in a program of his and the ReportViewer immediately displays the "Report is being generated" upon any request. The only difference is that the reporting server is on one server and the data is on another server. However, when I am developing the reports within SSRS, there is no delay. UPDATE I have noticed that only the first load of the ReportViewer takes a long time; each subsequent load of the same or different reports loads fast. I have a WinForms ReportViewer that I'm using in Remote processing mode that can take up to 30 seconds to render when the ReportViewer.RefreshReport() method is called. However, the report itself runs fast. This is the code to setup my ReportViewer: rvReport.ProcessingMode = ProcessingMode.Remote rvReport.ShowParameterPrompts = False rvReport.ServerReport.ReportServerUrl = New Uri(_reportServerURL) rvReport.ServerReport.ReportPath = _reportPath This is where the ReportViewer can take up to 30 seconds to render: rvReport.RefreshReport()

Read the article
MP3 Decoding on Android

- by Rob Szumlakowski

Hi. We're implementing a program for Android phones that plays audio streamed from the internet. Here's approximately what we do: Download a custom encrypted format. Decrypt to get chunks of regular MP3 data. Decode MP3 data to raw PCM data in a memory buffer. Pipe the raw PCM data to an AudioTrack Our target devices so far are Droid and Nexus One. Everything works great on Nexus One, but the MP3 decode is too slow on Droid. The audio playback starts to skip if we put the Droid under load. We are not permitted to decode the MP3 data to SD card, but I know that's not our problem anyways. We didn't write our own MP3 decoder, but used MPADEC (http://sourceforge.net/projects/mpadec/). It's free and was easy to integrate with our program. We compile it with the NDK. After exhaustive analysis with various profiling tools, we're convinced that it's this decoder that is falling behind. Here's the options we're thinking about: Find another MP3 decoder that we can compile with the Android NDK. This MP3 decoder would have to be either optimized to run on mobile ARM devices or maybe use integer-only math or some other optimizations to increase performance. Since the built-in Android MediaPlayer service will take URLs, we might be able to implement a tiny HTTP server in our program and serve the MediaPlayer with the decrypted MP3s. That way we can take advantage of the built-in MP3 decoder. Get access to the built-in MP3 decoder through the NDK. I don't know if this is possible. Does anyone have any suggestions on what we can do to speed up our MP3 decoding? -- Rob Sz

Read the article
Problem upgrading eclipse rcp app from 3.3 to 3.5 on Mac OS

- by Alb

I previously had an eclipse rcp app based on eclipse 3.3 pugins deployed on both windows and mac OS X 10.4. i'm now trying to port the app to java 1.6 and eclipse 3.5 (Build id: 20100218-1602) plugins on Mac OS X 10.5.8 (Leopard). I can launch the product from eclipse 3.5 on windows but not on Mac OS X. I have the 64bit cocoa eclipse IDE and java 6. In the launch configuration I set runtime JRE to JVM 1.6.0 and added required plugins. The plugins validate and everything else looks similar to windows configuration where it works, but when I launch i only get the following two lines in the console: 2010-03-16 13:29:32.742 java[758:10b] [Java CocoaComponent compatibility mode]: Enabled 2010-03-16 13:29:32.744 java[758:10b] [Java CocoaComponent compatibility mode]: Setting timeout for SWT to 0.100000 and then the program appears to just hang indefinitely. There is nothing written to the .log file so I'm not sure what error there is. EDIT: Here's what Yourkit profiling says before all cpu activity stops. +----------------------------------------------------------------------------+----------------+-----------------+ | Name | Time (ms) | Own Time (ms) | +----------------------------------------------------------------------------+----------------+-----------------+ | +---<All threads> | 2,799 100 % | | | | | | | | +---org.eclipse.equinox.launcher.Main.main(String[]) | 1,924 69% | 0 | | | | | | | +---org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run() | 632 23 % | 0 | | | | | | | +---java.lang.Thread.run() | 135 5 % | 0 | | | | | | | +---java.lang.ClassLoader.loadClassInternal(String) | 106 4 % | 0 | +----------------------------------------------------------------------------+----------------+-----------------+ , and this in the exceptions tab: Exception staticstics +----------------------------------------+--------------+ | Name | Count | +----------------------------------------+--------------+ | +---java.lang.ClassNotFoundException | 102 11 % | | | | | | +---java.net.MalformedURLException | 4 0 % | | | | | | +---java.lang.NoSuchMethodException | 3 0 % | | | | | | +---java.lang.NumberFormatException | 2 0 % | | | | | | +---java.io.FileNotFoundException | 1 0 % | | | | | | +---java.lang.UnsatisfiedLinkError | 1 0 % | +----------------------------------------+--------------+ and here's more details on the ClassNotFoundExceptions mentioned above: java.lang.ClassNotFoundException Start Level Event Dispatcher native ID: 0x8B0B group: 'main' 78 Thread-4 native ID: 0x10B group: 'main' 22 Framework Event Dispatcher native ID: 0xD207 group: 'main' 2 Anyone know why I don't see a trace for this in Eclipse or in any log files? any ideas where I should look? [Updated on: Tue, 16 March 2010 09:37]

Read the article
LINQ to SQL : Too much CPU Usage: What happens when there are multiple users.

- by soldieraman

I am using LINQ to SQL and seeing my CPU Usage sky rocketting. See below screenshot. I have three questions What can I do to reduce this CPU Usage. I have done profiling and basically removed everything. Will making every LINQ to SQL statement into a compiled query help? I also find that even with compiled queries simple statements like ByID() can take 3 milliseconds on a server with 3.25GB RAM 3.17GHz - this will just become slower on a less powerful computer. Or will the compiled query get faster the more it is used? The CPU Usage (on the local server goes to 12-15%) for a single user will this multiply with the number of users accessing the server - when the application is put on a live server. i.e. 2 users at a time will mean 15*2 = 30% CPU Usage. If this is the case is my application limited to maximum 4-5 users at a time then. Or doesnt LINQ to SQL .net share some CPU usage.

Read the article
ServerIdentity memory leak with IHttpAsyncHandler

- by Anton

I have a .NET web application that consists of a single HTTP handler class that implements IHttpAsyncHandler. All requests to this handler are handled asynchronously, though some requests are short-lived and some are long-lived (nothing over a few seconds). The problem is that memory consumption grows over time as requests are handled. All profiling results point to an unbounded growth of String objects held by instances of System.Runtime.Remoting.ServerIdentity. Every String value is different, but they all look similar to: /dd41c00e_1566_4702_b660_c81cdea18a43/vigefresi5pfv8n0ekddg57z_1154.rem There is nothing in my application that uses ServerIdentity directly, and unless I am mistaken, the ServerIdentity instances are proportional to the number of incoming requests. If this is an internal .NET structure, it looks like the CLR is not cleaning up after itself. What could be causing the leak? UPDATE A little less than half of the String objects are being held by System.Runtime.Remoting. The remaining String objects are being held by System.Runtime.Serialization and look similar to: +1sgess5rjcrgbmp3kqr6bmv_3474.rem Also, the problem only seems to occur when lots of simultaneous HTTP web requests arrive.

Read the article
Why is setting HTML5's CanvasPixelArray values ridiculously slow and how can I do it faster?

- by Nixuz

I am trying to do some dynamic visual effects using the HTML 5 canvas' pixel manipulation, but I am running into a problem where setting pixels in the CanvasPixelArray is ridiculously slow. For example if I have code like: imageData = ctx.getImageData(0, 0, 500, 500); for (var i = 0; i < imageData.length; i += 4){ imageData.data[i] = buffer[i]; imageData.data[i + 1] = buffer[i + 1]; imageData.data[i + 2] = buffer[i + 2]; } ctx.putImageData(imageData, 0, 0); Profiling with Chrome reveals, it runs 44% slower than the following code where CanvasPixelArray is not used. tempArray = new Array(500 * 500 * 4); imageData = ctx.getImageData(0, 0, 500, 500); for (var i = 0; i < imageData.length; i += 4){ tempArray[i] = buffer[i]; tempArray[i + 1] = buffer[i + 1]; tempArray[i + 2] = buffer[i + 2]; } ctx.putImageData(imageData, 0, 0); My guess is that the reason for this slowdown is due to the conversion between the Javascript doubles and the internal unsigned 8bit integers, used by the CanvasPixelArray. Is this guess correct? Is there anyway to reduce the time spent setting values in the CanvasPixelArray?

Read the article
high performance hibernate insert

- by luke

I am working on a latency sensitive part of an application, basically i will receive a network event transform the data and then insert all the data into the DB. After profiling i see that basically all my time is spent trying to save the data. here is the code private void insertAllData(Collection<Data> dataItems) { long start_time = System.currentTimeMillis(); long save_time = 0; long commit_time = 0; Transaction tx = null; try { Session s = HibernateSessionFactory.getSession(); s.setCacheMode(CacheMode.IGNORE); s.setFlushMode(FlushMode.NEVER); tx = s.beginTransaction(); for(Data data : dataItems) { s.saveOrUpdate(data); } save_time = System.currentTimeMillis(); tx.commit(); s.flush(); s.clear(); } catch(HibernateException ex) { if(tx != null) tx.rollback(); } commit_time = System.currentTimeMillis(); System.out.println("Save: " + (save_time - start_time)); System.out.println("Commit: " + (commit_time - save_time)); System.out.println(); } The size of the collection is always less than 20. here is the timing data that i see: Save: 27 Commit: 9 Save: 27 Commit: 9 Save: 26 Commit: 9 Save: 36 Commit: 9 Save: 44 Commit: 0 This is confusing to me. I figure that the save should be quick and all the time should be spent on commit. but clearly I'm wrong. I have also tried removing the transaction (its not really necessary) but i saw worse times... I have set hibernate.jdbc.batch_size=20... i need this operation to be as fast as possible, ideally there would only be one roundtrip to the database. How can i do this?

Read the article
emacs tramp performance

- by Oleg Pavliv

Is there a way to improve emacs tramp performance? For me it's faster to open an external ftp client (filezilla), transfer files to the local disk and open them in an external editor (notepad) than open them with emacs. I use emacs23.1 under windows xp. I tried different tramp-default-method (telnet, pscp, ftp), all of them have the same performance. Profiling results with elp-instrument-package are the following (I opened 3 remote files of 1.5 MB each one) tramp-file-name-handler 1461 350.41599999 0.2398466803 tramp-sh-file-name-handler 1461 350.02699999 0.2395804243 tramp-send-command 227 179.63400000 0.7913392070 tramp-send-command-and-check 205 177.77600000 0.8672000000 tramp-wait-for-regexp 227 176.47800000 0.7774361233 tramp-wait-for-output 226 176.40000000 0.7805309734 tramp-barf-unless-okay 18 133.46699999 7.4148333333 tramp-handle-insert-file-contents 3 132.046 44.015333333 tramp-handle-file-local-copy 3 131.281 43.760333333 tramp-accept-process-output 2375 112.95100000 0.0475583157 So, actual file transfer takes 132 sec, about 1/3 of total time. Why does it spend so much time in tramp-sh-file-name-handler? I tried to advice a function tramp-sh-file-name-handler to store and return cached results but it does not work, probably this function has some side effects. Any ideas how to improve tramp performance? (I use emacs 23.1 under WindowsXP)

Read the article
Selectively intercepting methods using autofac and dynamicproxy2

- by Mark Simpson

I'm currently doing a bit of experimenting using Autofac-1.4.5.676, autofac contrib and castle DynamicProxy2. The goal is to create a coarse-grained profiler that can intercept calls to specific methods of a particular interface. The problem: I have everything working perfectly apart from the selective part. I gather that I need to marry up my interceptor with an IProxyGenerationHook implementation, but I can't figure out how to do this. My code looks something like this: The interface that is to be intercepted & profiled (note that I only care about profiling the Update() method) public interface ISomeSystemToMonitor { void Update(); // this is the one I want to profile void SomeOtherMethodWeDontCareAboutProfiling(); } Now, when I register my systems with the container, I do the following: // Register interceptor gubbins builder.RegisterModule(new FlexibleInterceptionModule()); builder.Register<PerformanceInterceptor>(); // Register systems (just one in this example) builder.Register<AudioSystem>() .As<ISomeSystemToMonitor>) .InterceptedBy(typeof(PerformanceInterceptor)); All ISomeSystemToMonitor instances pulled out of the container are intercepted and profiled as desired, other than the fact that it will intercept all of its methods, not just the Update method. Now, how can I extend this to exclude all methods other than Update()? As I said, I don't understand how I'm meant to say "for the ProfileInterceptor, use this implementation of IProxyHookGenerator". All help appreciated, cheers! Also, please note that I can't upgrade to autofac2.x right now; I'm stuck with 1.

Read the article
How do people know so much about programming?

- by Luciano

I see people in this forums with a lot of points, so I assume they know about a lot of different programming stuff. When I was young I knew about basic (commodore) and the turbo pascal (pc). Then in college I learnt about C, memory management, x86 set, loop invariants, graphs, db query optimization, oop, functional, lambda calculus, prolog, concurrency, polymorphism, newton method, simplex, backtracking, dynamic programming, heuristics, np completeness, LR, LALR, neural networks, static & dynamic typing, turing, godel, and more in between. Then in industry I started with Java several years ago and learnt about it, and its variety of frameworks, and also design patterns, architecture patterns, web development, server development, mobile development, tdd, bdd, uml, use cases, bug trackers, process management, people management if you are a tech lead, profiling, security concerns, etc. I started to forget what I learnt in college... And then there is the stuff I don't know yet, like python, .net, perl, JVM stuff like groovy or scala.. Of course Google is a must for rapid documentation access to know if a problem has been solved already and how, and to keep informed about new stuff by blogs and places like this one. It's just too much or I just have a bad memory.. how do you guys manage it?

Read the article
Oracle T4CPreparedStatement memory leaks?

- by Jay

A little background on the application that I am gonna talk about in the next few lines: XYZ is a data masking workbench eclipse RCP application: You give it a source table column, and a target table column, it would apply a trasformation (encryption/shuffling/etc) and copy the row data from source table to target table. Now, when I mask n tables at a time, n threads are launched by this app. Here is the issue: I have run into a production issue on first roll out of the above said app. Unfortunately, I don't have any logs to get to the root. However, I tried to run this app in test region and do a stress test. When I collected .hprof files and ran 'em through an analyzer (yourKit), I noticed that objects of oracle.jdbc.driver.T4CPreparedStatement was retaining heap. The analysis also tells me that one of my classes is holding a reference to this preparedstatement object and thereby, n threads have n such objects. T4CPreparedStatement seemed to have character arrays: lastBoundChars and bindChars each of size char[300000]. So, I researched a bit (google!), obtained ojdbc6.jar and tried decompiling T4CPreparedStatement. I see that T4CPreparedStatement extends OraclePreparedStatement, which dynamically manages array size of lastBoundChars and bindChars. So, my questions here are: Have you ever run into an issue like this? Do you know the significance of lastBoundChars / bindChars? I am new to profiling, so do you think I am not doing it correct? (I also ran the hprofs through MAT - and this was the main identified issue - so, I don't really think I could be wrong?) I have found something similar on the web here: http://forums.oracle.com/forums/thread.jspa?messageID=2860681 Appreciate your suggestions / advice.

Read the article
Linq to sql DataContext cannot set load options after results been returned

- by David Liddle

I have two tables A and B with a one-to-many relationship respectively. On some pages I would like to get a list of A objects only. On other pages I would like to load A with objects in B attached. This can be handled by setting the load options DataLoadOptions options = new DataLoadOptions(); options.LoadWith<A>(a => a.B); dataContext.LoadOptions = options; The trouble occurs when I first of all view all A's with load options, then go to edit a single A (do not use load options), and after edit return to the previous page. I understand why the error is occurring but not sure how to best get round this problem. I would like the DataContext to be loaded up per request. I thought I was achieving this by using StructureMap to load up my DataContext on a per request basis. This is all part of an n-tier application where my Controllers call Services which in turn call Repositories. ForRequestedType<MyDataContext>() .CacheBy(InstanceScope.PerRequest) .TheDefault.Is.Object(new MyDataContext()); ForRequestedType<IAService>() .TheDefault.Is.OfConcreteType<AService>(); ForRequestedType<IARepository>() .TheDefault.Is.OfConcreteType<ARepository>(); Here is a brief outline of my Repository public class ARepository : IARepository { private MyDataContext db; public ARepository(MyDataContext context) { db = context; } public void SetLoadOptions(DataLoadOptions options) { db.LoadOptions = options; } public IQueryable<A> Get() { return from a in db.A select a; } So my ServiceLayer, on View All, sets the load options and then gets all A's. On editing A my ServiceLayer should spin up a new DataContext and just fetch a list of A's. When sql profiling, I can see that when I go to the Edit page it is requesting A with B objects.

Read the article
Bubble sort algorithm implementations (Haskell vs. C)

- by kingping

Hello. I have written 2 implementation of bubble sort algorithm in C and Haskell. Haskell implementation: module Main where main = do contents <- readFile "./data" print "Data loaded. Sorting.." let newcontents = bubblesort contents writeFile "./data_new_ghc" newcontents print "Sorting done" bubblesort list = sort list [] False rev = reverse -- separated. To see rev2 = reverse -- who calls the routine sort (x1:x2:xs) acc _ | x1 > x2 = sort (x1:xs) (x2:acc) True sort (x1:xs) acc flag = sort xs (x1:acc) flag sort [] acc True = sort (rev acc) [] False sort _ acc _ = rev2 acc I've compared these two implementations having run both on file with size of 20 KiB. C implementation took about a second, Haskell — about 1 min 10 sec. I have also profiled the Haskell application: Compile for profiling: C:\Temp ghc -prof -auto-all -O --make Main Profile: C:\Temp Main.exe +RTS -p and got these results. This is a pseudocode of the algorithm: procedure bubbleSort( A : list of sortable items ) defined as: do swapped := false for each i in 0 to length(A) - 2 inclusive do: if A[i] > A[i+1] then swap( A[i], A[i+1] ) swapped := true end if end for while swapped end procedure I wonder if it's possible to make Haskell implementation work faster without changing the algorithm (there's are actually a few tricks to make it work faster, but neither implementations have these optimizations)

Read the article
Why is my multithreaded Java program not maxing out all my cores on my machine?

- by James B

Hi, I have a program that starts up and creates an in-memory data model and then creates a (command-line-specified) number of threads to run several string checking algorithms against an input set and that data model. The work is divided amongst the threads along the input set of strings, and then each thread iterates the same in-memory data model instance (which is never updated again, so there are no synchronization issues). I'm running this on a Windows 2003 64-bit server with 2 quadcore processors, and from looking at Windows task Manager they aren't being maxed-out, (nor are they looking like they are being particularly taxed) when I run with 10 threads. Is this normal behaviour? It appears that 7 threads all complete a similar amount of work in a similar amount of time, so would you recommend running with 7 threads instead? Should I run it with more threads?...Although I assume this could be detrimental as the JVM will do more context switching between the threads. Alternatively, should I run it with fewer threads? Alternatively, what would be the best tool I could use to measure this?...Would a profiling tool help me out here - indeed, is one of the several profilers better at detecting bottlenecks (assuming I have one here) than the rest? Note, the server is also running SQL Server 2005 (this may or may not be relevant), but nothing much is happening on that database when I am running my program. Note also, the threads are only doing string matching, they aren't doing any I/O or database work or anything else they may need to wait on. Thanks in advance, -James

Read the article
Which LINQ expression is faster

- by Vlad Bezden

Hi All In following code public class Person { public string Name { get; set; } public uint Age { get; set; } public Person(string name, uint age) { Name = name; Age = age; } } void Main() { var data = new List<Person>{ new Person("Bill Gates", 55), new Person("Steve Ballmer", 54), new Person("Steve Jobs", 55), new Person("Scott Gu", 35)}; // 1st approach data.Where (x => x.Age > 40).ToList().ForEach(x => x.Age++); // 2nd approach data.ForEach(x => { if (x.Age > 40) x.Age++; }); data.ForEach(x => Console.WriteLine(x)); } in my understanding 2nd approach should be faster since it iterates through each item once and first approach is running 2 times: Where clause ForEach on subset of items from where clause. However internally it might be that compiler translates 1st approach to the 2nd approach anyway and they will have the same performance. Any suggestions or ideas? I could do profiling like suggested, but I want to understand what is going on compiler level if those to lines of code are the same to the compiler, or compiler will treat it literally. Thanks in advance for your help.

Read the article
How to profile Doctrine in Zend Framework

- by David Zapata

Good day. I'm using Doctrine as ORM for my Zend Framework project. This is the first time I use it. I've followed the ZendCasts Doctrine chapters, and everything works for me, but I needed to perform some profiling; There is a Doctrine_Connection_Profiler class that should be used to profile the Doctrine Model internal queries, but I've tried to use it without success. I always get a "PDOException: You cannot serialize or unserialize PDOStatement instances" exception when I perform my Unit Tests. Here is a example: $conn = Doctrine_Manager::connection($doctrineConfig['dsn'], $dbconfname); ... if( APPLICATION_ENV != 'production'){ $obj_doctrine_profiler = new Doctrine_Connection_Profiler(); $conn->setListener($obj_doctrine_profiler); } All of my Unit Tests works if I comment/delete the $conn->setListener($obj_doctrine_profiler); line. This code block is located in my Bootstrap.php class; the weird thing is, the web application works just fine even with the mentioned code line. Thank you so much for your help. please excuse me if my english is not the best.

Read the article
Are there any prototype-based languages with a whole development cycle?

- by Kaveh Shahbazian

Are there any real-world prototype-based programming languages with a whole development cycle? "A whole development cycle" like Ruby and Python: web frameworks, scripting/interacting with the system, tools for debugging, profiling, etc. Thank you A brief note on PBPLs: (let's call these languages PBPL : prototype-based programming language) There are some PBPLs out there. Some are being widely used like JavaScript (which Node.js may bring it into the field - or may not!). One other language is ActionScript which is also a PBPL but tightly bound to Flash VM (is it correct to say so?). From less known ones I can speak of Lua which has a strong reputation in game development (mostly spread by WOW) but never took off as a full language. Lua has a table concept which can provide you some sort of prototype based programming facility. There is also JScript (Windows scripting tool) which is already pointless by the newcomer PowerShell (I have used JScript to manipulate IIS but I never understood what is JScript!). Others can be named like io (indeed very very neat, you will fall in love with it; absolutely impossible to use) and REBOL (What is this all about? A proprietary scripting tool? You must be kidding!) and newLISP (Which is actually a full language, but no one ever heard about it). For sure there are much more to list here but either I do not remember or I did not understood them as a real world thing, like Self).

Read the article

< Previous Page | 14 15 16 17 18 19 20 21 22 23 | Next Page >