Search Results

Search found 13206 results on 529 pages for 'performance measurement'.

Page 24/529 | < Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31  | Next Page >

  • Lucene (.NET) Document stucture and performance suggestions.

    - by Josh Handel
    Hello, I am indexing about 100M documents that consist of a few string identifiers and a hundred or so numaric terms.. I won't be doing range queries, so I haven't dugg too deep into Numaric Field but I'm not thinking its the right choose here. My problem is that the query performance degrades quickly when I start adding OR criteria to my query.. All my queries are on specific numaric terms.. So a document looks like StringField:[someString] and N DataField:[someNumber].. I then query it with something like DataField:((+1 +(2 3)) (+75 +(3 5 52)) (+99 +88 +(102 155 199))). Currently these queries take about 7 to 16 seconds to run on my laptop.. I would like to make sure thats really the best they can do.. I am open to suggestions on field structure and query structure :-). Thanks Josh PS: I have already read over all the other lucene performance discussions on here, and on the Lucene wiki and at lucid imiagination... I'm a bit further down the rabbit hole then that...

    Read the article

  • Testing performance of queries in mysl

    - by Unreason
    I am trying to setup a script that would test performance of queries on a development mysql server. Here are more details: I have root access I am the only user accessing the server Mostly interested in InnoDB performance The queries I am optimizing are mostly search queries (SELECT ... LIKE '%xy%') What I want to do is to create reliable testing environment for measuring the speed of a single query, free from dependencies on other variables. Till now I have been using SQL_NO_CACHE, but sometimes the results of such tests also show caching behaviour - taking much longer to execute on the first run and taking less time on subsequent runs. If someone can explain this behaviour in full detail I might stick to using SQL_NO_CACHE; I do believe that it might be due to file system cache and/or caching of indexes used to execute the query, as this post explains. It is not clear to me when Buffer Pool and Key Buffer get invalidated or how they might interfere with testing. So, short of restarting mysql server, how would you recommend to setup an environment that would be reliable in determining if one query performs better then the other?

    Read the article

  • Performance Difference between HttpContext user and Thread user

    - by atrueresistance
    I am wondering what the difference between HttpContext.Current.User.Identity.Name.ToString.ToLower and Thread.CurrentPrincipal.Identity.Name.ToString.ToLower. Both methods grab the username in my asp.net 3.5 web service. I decided to figure out if there was any difference in performance using a little program. Running from full Stop to Start Debugging in every run. Dim st As DateTime = DateAndTime.Now Try 'user = HttpContext.Current.User.Identity.Name.ToString.ToLower user = Thread.CurrentPrincipal.Identity.Name.ToString.ToLower Dim dif As TimeSpan = Now.Subtract(st) Dim break As String = "nothing" Catch ex As Exception user = "Undefined" End Try I set a breakpoint on break to read the value of dif. The results were the same for both methods. dif.Milliseconds 0 Integer dif.Ticks 0 Long Using a longer duration, loop 5,000 times results in these figures. Thread Method run 1 dif.Milliseconds 125 Integer dif.Ticks 1250000 Long run 2 dif.Milliseconds 0 Integer dif.Ticks 0 Long run 3 dif.Milliseconds 0 Integer dif.Ticks 0 Long HttpContext Method run 1 dif.Milliseconds 15 Integer dif.Ticks 156250 Long run 2 dif.Milliseconds 156 Integer dif.Ticks 1562500 Long run 3 dif.Milliseconds 0 Integer dif.Ticks 0 Long So I guess what is more prefered, or more compliant with webservice standards? If there is some type of a performance advantage, I can't really tell. Which one scales to larger environments easier?

    Read the article

  • Divide and conquer of large objects for GC performance

    - by Aperion
    At my work we're discussing different approaches to cleaning up a large amount of managed ~50-100MB memory.There are two approaches on the table (read: two senior devs can't agree) and not having the experience the rest of the team is unsure of what approach is more desirable, performance or maintainability. The data being collected is many small items, ~30000 which in turn contains other items, all objects are managed. There is a lot of references between these objects including event handlers but not to outside objects. We'll call this large group of objects and references as a single entity called a blob. Approach #1: Make sure all references to objects in the blob are severed and let the GC handle the blob and all the connections. Approach #2: Implement IDisposable on these objects then call dispose on these objects and set references to Nothing and remove handlers. The theory behind the second approach is since the large longer lived objects take longer to cleanup in the GC. So, by cutting the large objects into smaller bite size morsels the garbage collector will processes them faster, thus a performance gain. So I think the basic question is this: Does breaking apart large groups of interconnected objects optimize data for garbage collection or is better to keep them together and rely on the garbage collection algorithms to processes the data for you? I feel this is a case of pre-optimization, but I do not know enough of the GC to know what does help or hinder it.

    Read the article

  • SQL Server performance issue.

    - by Jit
    Hi Friends, I have been trying to analyze performance issue with SQL Server 2005. We have 30 jobs, one for each databases (30 databases, one per each client). The jobs run at early morning at an interval of 5 minutes. When I run the job individually for testing, for most of the databases it finishes in 7 to 9 minutes. But when these jobs run at early morning, I see few jobs taking 2 to 3 hours to finish and the same takes few minutes as mentioned above if ran independently. We dont have any other job scheduled during that time, other than these 30 jobs. If we restart the server then for 2 or so days all the jobs finishes in few minutes, but over the period of time (from 3rd day suddenly), few jobs start taking hours to finish. What could be the possible reason of performance degradation over the period of time? I verified all the SPs and we uses temp tables and I made sure none of the temp table is left without dropping at the end of SP. Let me know what are the possible reasons for such behavior. Appreciate your time and help. Thanks

    Read the article

  • Poor performance using RMI-proxies with Swing components

    - by Patrick
    I'm having huge performance issues when I add RMI proxy references to a Java Swing JList-component. I'm retrieving a list of user Profiles with RMI from a server. The retrieval itself takes just a second or so, so that's acceptable under the circumstances. However, when I try to add these proxies to a JList, with the help of a custom ListModel and a CellRenderer, it takes between 30-60 seconds to add about 180 objects. Since it is a list of users' names, it's preferrable to present them alphabetically. The biggest performance hit is when I sort the elements as they get added to the ListModel. Since the list will always be sorted, I opted to use the built-in Collections.binarySearch() to find the correct position for the next element to be added, and the comparator uses two methods that are defined by the Profile interface, namely getFirstName() and getLastName(). Is there any way to speed this process up, or am I simply implementing it the wrong way? Or is this a "feature" of RMI? I'd really love to be able to cache some of the data of the remote objects locally, to minimize the remote method calls.

    Read the article

  • Google app engine: Poor Performance with JDO + Datastore

    - by Bosh
    I have a simple data model that includes USERS: store basic information (key, name, phone # etc) RELATIONS: describe, e.g. a friendship between two users (supplying a relationship_type + two user keys) I'm getting very poor performance, for instance, if I try to print the first names of all of a user's friends. Say the user has 500 friends: I can fetch the list of friend user_ids very easily in a single query. But then, to pull out first names, I have to do 500 back-and-forth trips to the Datastore, each of which seems to take on the order of 30 ms. If this were SQL, I'd just do a JOIN and get the answer out fast. I understand there are rudimentary facilities for performing joins across un-owned relations in a relaxed implementation of JDO (as described at http://gae-java-persistence.blogspot.com) but they sound experimental and non-standard (e.g. my code won't work in any other JDO implementation). Is this really my best bet? Otherwise, how do people extract satisfactory performance from JDO/Datastore in this kind of (very common) situation? -Bosh

    Read the article

  • Silverlight performance with many loaded controls

    - by gius
    I have a SL application with many DataGrids (from Silverlight Toolkit), each on its own view. If several DataGrids are opened, changing between views (TabItems, for example) takes a long time (few seconds) and it freezes the whole application (UI thread). The more DataGrids are loaded, the longer the change takes. These DataGrids that slow the UI chanage might be on other places in the app and not even visible at that moment. But once they are opened (and loaded with data), they slow showing other DataGrids. Note that DataGrids are NOT disposed and then recreated again, they still remain in memory, only their parent control is being hidden and visible again. I have profiled the application. It shows that agcore.dll's SetValue function is the bottleneck. Unfortunately, debug symbols are not available for this Silverlight native library responsible for drawing. The problem is not in the DataGrid control - I tried to replace it with XCeed's grid and the performance when changing views is even worse. Do you have any idea how to solve this problem? Why more opened controls slow down other controls? I have created a sample that shows this issue: http://cenud.cz/PerfTest.zip UPDATE: Using VS11 profiler on the sample provided suggests that the problem could be in MeasureOverride being called many times (for each DataGridCell, I guess). But still, why is it slower as more controls are loaded elsewhere? Is there a way to improve the performance?

    Read the article

  • Mysql Performance Question - Essentially about normalizing efficiency

    - by freqmode
    Hi there. Just a quick question about database performance. I'll outline my site purpose below as background. I'm creating a dictionary site that saves the words users define to a database. What I'm wondering is whether or not to create a words table for each user or to keep one massive words table. This site will be used for entire schools so the single words table would be massive! The database structure is as follows: A user table with: User_ID PRIMARY KEY Username First Last Password Email Country Research Standings SendInfo Donated JoinedOn LastLogin Logins Correct Attempts Admin Active And one word table with: User_ID PRIMARY KEY Word Vocab Spell Defined DefinedAttempted Spelled SpelledAttempted Sentenced SentencedAttempted So what I'm asking is , performance-wise, should I create a new table for each user when they join the site - each user could have hundreds or thousands of words over time? Or is it better to have one massive table with thousands and thousands of records and filter by User_ID. I don't think I'll perform many table joins. My gut feeling is to create a new table for each user, but I thought I'd ask for expert advice! Thanks in advance.

    Read the article

  • Reduce durability in MySQL for performance

    - by Paul Prescod
    My site occasionally has fairly predictable bursts of traffic that increase the throughput by 100 times more than normal. For example, we are going to be featured on a television show, and I expect in the hour after the show, I'll get more than 100 times more traffic than normal. My understanding is that MySQL (InnoDB) generally keeps my data in a bunch of different places: RAM Buffers commitlog binary log actual tables All of the above places on my DB slave This is too much "durability" given that I'm on an EC2 node and most of the stuff goes across the same network pipe (file systems are network attached). Plus the drives are just slow. The data is not high value and I'd rather take a small chance of a few minutes of data loss rather than have a high probability of an outage when the crowd arrives. During these traffic bursts I would like to do all of that I/O only if I can afford it. I'd like to just keep as much in RAM as possible (I have a fair chunk of RAM compared to the data size that would be touched over an hour). If buffers get scarce, or the I/O channel is not too overloaded, then sure, I'd like things to go to the commitlog or binary log to be sent to the slave. If, and only if, the I/O channel is not overloaded, I'd like to write back to the actual tables. In other words, I'd like MySQL/InnoDB to use a "write back" cache algorithm rather than a "write through" cache algorithm. Can I convince it to do that? If this is not possible, I am interested in general MySQL write-performance optimization tips. Most of the docs are about optimizing read performance, but when I get a crowd of users, I am creating accounts for all of them, so that's a write-heavy workload.

    Read the article

  • Displaying performance metrics in a modern web app?

    - by Charles
    We're updating our ancient internal PHP application at work. Right now, we gather extensive performance measurements on every pageview, and log them to the database. Additionally, users requested that some of the metrics be displayed at the bottom of the page. This worked out pretty well for us, because the last thing that the application does on every request is include the file containing the HTML footer. The updated parts of the application use an MVC framework and a Dispatch/Request/Response loop. The page footer is no longer the last thing done. In fact, it could very well be the first thing done, before the rest of the page is created. Because we can grab the Response before it's returned to the user, we could try to include placeholders for the performance metrics in the footer and simply replace them with the actual numbers, but this strikes me as a bad idea somehow. How do you handle this in your modern web app? While we're using PHP, I'm curious how it's done in a Ruby/Rails app, and in your favorite Python framework.

    Read the article

  • PHP: Opening/closing tags & performance?

    - by Tom
    Hi, This may be a silly question, but as someone relatively new to PHP, I'm wondering if there are any performance-related issues to frequently opening and closing PHP tags in HTML template code, and if so, what might be best practices in terms of working with php tags? My question is not about the importance/correctness of closing tags, or about which type of code is more readable than another, but rather about how the document gets parsed/executed and what impact it might have on performance. To illustrate, consider the following two extremes: Mixing PHP and HTML tags: <?php echo '<tr> <td>'.$variable1.'</td> <td>'.$variable2.'</td> <td>'.$variable3.'</td> <td>'.$variable4.'</td> <td>'.$variable5.'</td> </tr>' ?> // PHP tag opened once Separating PHP and HTML tags: <tr> <td><?php echo $variable1 ?></td> <td><?php echo $variable2 ?></td> <td><?php echo $variable3 ?></td> <td><?php echo $variable4 ?></td> <td><?php echo $variable5 ?></td> </tr> // PHP tag opened five times Would be interested in hearing some views on this, even if it's just to hear that it makes no difference. Thanks.

    Read the article

  • Performance Problems with Django's F() Object

    - by JayhawksFan93
    Has anyone else noticed performance issues using Django's F() object? I am running Windows XP SP3 and developing against the Django trunk. A snippet of the models I'm using and the query I'm building are below. When I have the F() object in place, each call to a QuerySet method (e.g. filter, exclude, order_by, distinct, etc.) takes approximately 2 seconds, but when I comment out the F() clause the calls are sub-second. I had a co-worker test it on his Ubuntu machine, and he is not experiencing the same performance issues I am with the F() clause. Anyone else seeing this behavior? class Move (models.Model): state_meaning = models.CharField( max_length=16, db_index=True, blank=True, default='' ) drop = models.ForeignKey( Org, db_index=True, null=False, default=1, related_name='as_move_drop' ) class Split(models.Model): state_meaning = models.CharField( max_length=16, db_index=True, blank=True, default='' ) move = models.ForeignKey( Move, related_name='splits' ) pickup = models.ForeignKey( Org, db_index=True, null=False, default=1, related_name='as_split_pickup' ) pickup_date = models.DateField( null=True, default=None ) drop = models.ForeignKey( Org, db_index=True, null=False, default=1, related_name='as_split_drop' ) drop_date = models.DateField( null=True, default=None, db_index=True ) def get_splits(begin_date, end_date): qs = Split.objects \ .filter(state_meaning__in=['INPROGRESS','FULFILLED'], drop=F('move__drop'), # <<< the line in question pickup_date__lte=end_date) elapsed = timer.clock() - start print 'qs1 took %.3f' % elapsed start = timer.clock() qs = qs.filter(Q(drop_date__gte=begin_date) | Q(drop_date__isnull=True)) elapsed = timer.clock() - start print 'qs2 took %.3f' % elapsed start = timer.clock() qs = qs.exclude(move__state_meaning='UNFULFILLED') elapsed = timer.clock() - start print 'qs3 took %.3f' % elapsed start = timer.clock() qs = qs.order_by('pickup_date', 'drop_date') elapsed = timer.clock() - start print 'qs7 took %.3f' % elapsed start = timer.clock() qs = qs.distinct() elapsed = timer.clock() - start print 'qs8 took %.3f' % elapsed

    Read the article

  • performance issue: difference between select s.* vs select *

    - by kamil
    Recently I had some problem in performance of my query. The thing is described here: poor Hibernate select performance comparing to running directly - how debug? After long time of struggling, I've finally discovered that the query with select prefix like: select sth.* from Something as sth... Is 300x times slower then query started this way: select * from Something as sth.. Could somebody help me, and asnwer why is that so? Some external documents on this would be really useful. The table used for testing was: SALES_UNIT table contains some basic info abot sales unit node such as name and etc. The only association is to table SALES_UNIT_TYPE, as ManyToOne. The primary key is ID and field VALID_FROM_DTTM which is date. SALES_UNIT_RELATION contains relation PARENT-CHILD between sales unit nodes. Consists of SALES_UNIT_PARENT_ID, SALES_UNIT_CHILD_ID and VALID_TO_DTTM/VALID_FROM_DTTM. No association with any tables. The PK here is ..PARENT_ID, ..CHILD_ID and VALID_FROM_DTTM The actual query I've done was: select s.* from sales_unit s left join sales_unit_relation r on (s.sales_unit_id = r.sales_unit_child_id) where r.sales_unit_child_id is null select * from sales_unit s left join sales_unit_relation r on (s.sales_unit_id = r.sales_unit_child_id) where r.sales_unit_child_id is null Same query, both uses left join and only difference is with select.

    Read the article

  • How can I improve performance over SMB/CIFS for an application that has poor write speeds?

    - by Jeremy
    I have a third party application that reads several large files and generates a third large file. Its performance is quite good when the generated file is stored on "local storage", i.e. either a direct attached or iSCSI-based disk. The source files that are read can be stored remotely on our NAS and accessed via SMB with little effect on performance. However, if we attempt to write the target file to any kind of SMB/CIFS share (Samba or Windows Server) the performance drops almost ten-fold. This is unacceptably slow in our case. Writing files to network shares is not otherwise slow. I can copy large files to SMB shares and get great performance - near what I would expect is possible given the disks and network in question. I have a theory that this application's problem with SMB shares has something to do with a lack of write caching over the share and perhaps lots of network roundtrips. Is this possible and is there anything that can be done about it?

    Read the article

  • Get The Most From MySQL Database With MySQL Performance Tuning Training

    - by Antoinette O'Sullivan
    Get the most from MySQL Server's top-level performance by improving your understanding of perforamnce tuning techniques. MySQL Performance Tuning Class In this 4 day class, you'll learn practical, safe, highly efficient ways to optimize performance for the MySQL Server. You can take this class as: Training-on-Demand: Start training within 24 hours of registering and follow the instructor-led lecture material through streaming video at your own pace. Schedule time lab-time to perform the hands-on exercises at your convenience. Live-Virtual Class: Follow the live instructor led class from your own desk - no travel required. There are already a range of events on the schedule to suit different timezones and with delivery in languages including English and German. In-Class Event: Travel to a training center to follow this class. For more information on this class, to see the schedule or register interest in additional events, go to http://oracle.com/education/mysql Troubleshooting MySQL Performance with Sveta Smirnova  During this one-day, live-virtual event, you get a unique opportunity to hear Sveta Smirnova, author of MySQL Troubleshooting, share her indepth experience of identifying and solving performance problems with a MySQL Database. And you can benefit from this opportunity without incurring any travel costs! Dimitri's Blog If MySQL Performance is a topic that interests you, then you should be following Dimitri Kravtchuk's blog. For more information on any aspect of the Authentic MySQL Curriculum, go to http://oracle.com/education/mysql.

    Read the article

  • Use CompiledQuery.Compile to improve LINQ to SQL performance

    - by Michael Freidgeim
    After reading DLinq (Linq to SQL) Performance and in particular Part 4  I had a few questions. If CompiledQuery.Compile gives so much benefits, why not to do it for all Linq To Sql queries? Is any essential disadvantages of compiling all select queries? What are conditions, when compiling makes whose performance, for how much percentage? World be good to have default on application config level or on DBML level to specify are all select queries to be compiled? And the same questions about Entity Framework CompiledQuery Class. However in comments I’ve found answer  of the author ricom 6 Jul 2007 3:08 AM Compiling the query makes it durable. There is no need for this, nor is there any desire, unless you intend to run that same query many times. SQL provides regular select statements, prepared select statements, and stored procedures for a reason.  Linq now has analogs. Also from 10 Tips to Improve your LINQ to SQL Application Performance   If you are using CompiledQuery make sure that you are using it more than once as it is more costly than normal querying for the first time. The resulting function coming as a CompiledQuery is an object, having the SQL statement and the delegate to apply it.  And your delegate has the ability to replace the variables (or parameters) in the resulting query. However I feel that many developers are not informed enough about benefits of Compile. I think that tools like FxCop and Resharper should check the queries  and suggest if compiling is recommended. Related Articles for LINQ to SQL: MSDN How to: Store and Reuse Queries (LINQ to SQL) 10 Tips to Improve your LINQ to SQL Application Performance Related Articles for Entity Framework: MSDN: CompiledQuery Class Exploring the Performance of the ADO.NET Entity Framework - Part 1 Exploring the Performance of the ADO.NET Entity Framework – Part 2 ADO.NET Entity Framework 4.0: Making it fast through Compiled Query

    Read the article

  • Tips / techniques for high-performance C# server sockets

    - by McKenzieG1
    I have a .NET 2.0 server that seems to be running into scaling problems, probably due to poor design of the socket-handling code, and I am looking for guidance on how I might redesign it to improve performance. Usage scenario: 50 - 150 clients, high rate (up to 100s / second) of small messages (10s of bytes each) to / from each client. Client connections are long-lived - typically hours. (The server is part of a trading system. The client messages are aggregated into groups to send to an exchange over a smaller number of 'outbound' socket connections, and acknowledgment messages are sent back to the clients as each group is processed by the exchange.) OS is Windows Server 2003, hardware is 2 x 4-core X5355. Current client socket design: A TcpListener spawns a thread to read each client socket as clients connect. The threads block on Socket.Receive, parsing incoming messages and inserting them into a set of queues for processing by the core server logic. Acknowledgment messages are sent back out over the client sockets using async Socket.BeginSend calls from the threads that talk to the exchange side. Observed problems: As the client count has grown (now 60-70), we have started to see intermittent delays of up to 100s of milliseconds while sending and receiving data to/from the clients. (We log timestamps for each acknowledgment message, and we can see occasional long gaps in the timestamp sequence for bunches of acks from the same group that normally go out in a few ms total.) Overall system CPU usage is low (< 10%), there is plenty of free RAM, and the core logic and the outbound (exchange-facing) side are performing fine, so the problem seems to be isolated to the client-facing socket code. There is ample network bandwidth between the server and clients (gigabit LAN), and we have ruled out network or hardware-layer problems. Any suggestions or pointers to useful resources would be greatly appreciated. If anyone has any diagnostic or debugging tips for figuring out exactly what is going wrong, those would be great as well. Note: I have the MSDN Magazine article Winsock: Get Closer to the Wire with High-Performance Sockets in .NET, and I have glanced at the Kodart "XF.Server" component - it looks sketchy at best.

    Read the article

  • Performance Testing a .NET Smart Client Application (.NET ClickOnce technology)

    - by jn29098
    Has anyone ever had to run performance tests on a ClickOnce application? I have engaged with a vendor who had trouble setting up their toolset with our software because it is Smart Client based. They are understandably more geared toward purely browser-based applications. I wonder if anyone has had to tackle this before and if so would you recommend any vendors who use industry standard tools such as Load Runner (which i assume can handle the smart client)? Thanks

    Read the article

  • Jython webapp performance

    - by DrPep
    I'm currently building a Jython web app but am concerned about Jython application performance. I take some comfort in that any compute intensive tasks I can write in a separate Java jar and invoke them from Jython. Has anyone had problems doing this, or forsee issues with such a setup?

    Read the article

  • OTRS slow performance main queue listing

    - by mrml
    we have the problem of a very slow queue listing. If there are more than 15 tickets in a single queue it takes up to 4-5 seconds for creating the view. This problems occure since we're using OTRS 3.1 We are running OTRS 3.1.4 with the KIX4OTRS extension on a virtualized Ubuntu 10.04 LTS. We tried yet: known performance tweaks provided in the manuals. creating extra database indexes installation on physical machines (no positive effects) with Ubuntu 12.04 / 12.04.1 Any ideas?

    Read the article

  • Wpf: Performance Issue

    - by viky
    I am working on a wpf application. In which I am working with a TreeView, each node represents different datatypes, these datatypes are having properties defined and using data template to show their properties. My application reads from xml and create tree accordingly. My problem is that when I load it, it is too slow, I want to know about the tricks that will help me to improve performance of my(any) wpf application.

    Read the article

  • Star Schema vs Snowflake Schema performance

    - by Megawolt
    Hi... I'm begin to developing a scial sharing website so I'm curious about database design Schema... So in Data-Mining Star-Schema is the best one but how about a social sharing website... And as a nature of the SS websites there will be (i hope :)) many users in same time... Which better for performance for overdose using...

    Read the article

  • High performance web (-services) applications

    - by User Friendly
    Hi, I'd like to become a guru in high performance web & web-services applications. What technologies/patterns/skills do you reccomend to look at? Basically, I have good skills at ASP.NET/.NET based web development, but I'd like to know how big things are built (on any platform, not depending on .net technology stack). Thank you.

    Read the article

  • Modeling distribution of performance measurements

    - by peterchen
    How would you mathematically model the distribution of repeated real life performance measurements - "Real life" meaning you are not just looping over the code in question, but it is just a short snippet within a large application running in a typical user scenario? My experience shows that you usually have a peak around the average execution time that can be modeled adequately with a Gaussian distribution. In addition, there's a "long tail" containing outliers - often with a multiple of the average time. (The behavior is understandable considering the factors contributing to first execution penalty). My goal is to model aggregate values that reasonably reflect this, and can be calculated from aggregate values (like for the Gaussian, calculate mu and sigma from N, sum of values and sum of squares). In other terms, number of repetitions is unlimited, but memory and calculation requirements should be minimized. A normal Gaussian distribution can't model the long tail appropriately and will have the average biased strongly even by a very small percentage of outliers. I am looking for ideas, especially if this has been attempted/analysed before. I've checked various distributions models, and I think I could work out something, but my statistics is rusty and I might end up with an overblown solution. Oh, a complete shrink-wrapped solution would be fine, too ;) Other aspects / ideas: Sometimes you get "two humps" distributions, which would be acceptable in my scenario with a single mu/sigma covering both, but ideally would be identified separately. Extrapolating this, another approach would be a "floating probability density calculation" that uses only a limited buffer and adjusts automatically to the range (due to the long tail, bins may not be spaced evenly) - haven't found anything, but with some assumptions about the distribution it should be possible in principle. Why (since it was asked) - For a complex process we need to make guarantees such as "only 0.1% of runs exceed a limit of 3 seconds, and the average processing time is 2.8 seconds". The performance of an isolated piece of code can be very different from a normal run-time environment involving varying levels of disk and network access, background services, scheduled events that occur within a day, etc. This can be solved trivially by accumulating all data. However, to accumulate this data in production, the data produced needs to be limited. For analysis of isolated pieces of code, a gaussian deviation plus first run penalty is ok. That doesn't work anymore for the distributions found above. [edit] I've already got very good answers (and finally - maybe - some time to work on this). I'm starting a bounty to look for more input / ideas.

    Read the article

< Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31  | Next Page >