Search Results

Search found 13403 results on 537 pages for '2 epm performance tuning'.

Page 38/537 | < Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45 | Next Page >

Performance of tokenizing CSS in PHP

- by Boldewyn

This is a noob question from someone who hasn't written a parser/lexer ever before. I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft). It's a list of 21 possible tokens, that all (but two) cannot be represented as static strings. My current approach is to loop through an array containing the 21 patterns over and over again, do an if (preg_match()) and reduce the source string match by match. In principle this works really good. However, for a 1000 lines CSS string this takes something between 2 and 8 seconds, which is too much for my project. Now I'm banging my head how other parsers tokenize and parse CSS in fractions of seconds. OK, C is always faster than PHP, but nonetheless, are there any obvious D'Oh! s that I fell into? I made some optimizations, like checking for '@', '#' or '"' as the first char of the remaining string and applying only the relevant regexp then, but this hadn't brought any great performance boosts. My code (snippet) so far: $TOKENS = array( 'IDENT' => '...regexp...', 'ATKEYWORD' => '@...regexp...', 'String' => '"...regexp..."|\'...regexp...\'', //... ); $string = '...CSS source string...'; $stream = array(); // we reduce $string token by token while ($string != '') { $string = ltrim($string, " \t\r\n\f"); // unconsumed whitespace at the // start is insignificant but doing a trim reduces exec time by 25% $matches = array(); // loop through all possible tokens foreach ($TOKENS as $t => $p) { // The '&' is used as delimiter, because it isn't used anywhere in // the token regexps if (preg_match('&^'.$p.'&Su', $string, $matches)) { $stream[] = array($t, $matches[0]); $string = substr($string, strlen($matches[0])); // Yay! We found one that matches! continue 2; } } // if we come here, we have a syntax error and handle it somehow } // result: an array $stream consisting of arrays with // 0 => type of token // 1 => token content

Read the article
Performance Optimization for Matrix Rotation

- by Summer_More_More_Tea

Hello everyone: I'm now trapped by a performance optimization lab in the book "Computer System from a Programmer's Perspective" described as following: In a N*N matrix M, where N is multiple of 32, the rotate operation can be represented as: Transpose: interchange elements M(i,j) and M(j,i) Exchange rows: Row i is exchanged with row N-1-i A example for matrix rotation(N is 3 instead of 32 for simplicity): ------- ------- |1|2|3| |3|6|9| ------- ------- |4|5|6| after rotate is |2|5|8| ------- ------- |7|8|9| |1|4|7| ------- ------- A naive implementation is: #define RIDX(i,j,n) ((i)*(n)+(j)) void naive_rotate(int dim, pixel *src, pixel *dst) { int i, j; for (i = 0; i < dim; i++) for (j = 0; j < dim; j++) dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)]; } I come up with an idea by inner-loop-unroll. The result is: Code Version Speed Up original 1x unrolled by 2 1.33x unrolled by 4 1.33x unrolled by 8 1.55x unrolled by 16 1.67x unrolled by 32 1.61x I also get a code snippet from pastebin.com that seems can solve this problem: void rotate(int dim, pixel *src, pixel *dst) { int stride = 32; int count = dim >> 5; src += dim - 1; int a1 = count; do { int a2 = dim; do { int a3 = stride; do { *dst++ = *src; src += dim; } while(--a3); src -= dim * stride + 1; dst += dim - stride; } while(--a2); src += dim * (stride + 1); dst -= dim * dim - stride; } while(--a1); } After carefully read the code, I think main idea of this solution is treat 32 rows as a data zone, and perform the rotating operation respectively. Speed up of this version is 1.85x, overwhelming all the loop-unroll version. Here are the questions: In the inner-loop-unroll version, why does increment slow down if the unrolling factor increase, especially change the unrolling factor from 8 to 16, which does not effect the same when switch from 4 to 8? Does the result have some relationship with depth of the CPU pipeline? If the answer is yes, could the degrade of increment reflect pipeline length? What is the probable reason for the optimization of data-zone version? It seems that there is no too much essential difference from the original naive version. EDIT: My test environment is Intel Centrino Duo processor and the verion of gcc is 4.4 Any advice will be highly appreciated! Kind regards!

Read the article
Neo4j 1.9.4 (REST Server,CYPHER) performance issue

- by user2968943

I have Neo4j 1.9.4 installed on 24 core 24Gb ram (centos) machine and for most queries CPU usage spikes goes to 200% with only few concurrent requests. Domain: some sort of social application where few types of nodes(profiles) with 3-30 text/array properties and 36 relationship types with at least 3 properties. Most of nodes currently has ~300-500 relationships. Current data set footprint(from console): LogicalLogSize=4294907 (32MB) ArrayStoreSize=1675520 (12MB) NodeStoreSize=1342170 (10MB) PropertyStoreSize=1739548 (13MB) RelationshipStoreSize=6395202 (48MB) StringStoreSize=1478400 (11MB) which is IMHO really small. most queries looks like this one(with more or less WITH .. MATCH .. statements and few queries with variable length relations but the often fast): START targetUser=node({id}), currentUser=node({current}) MATCH targetUser-[contact:InContactsRelation]->n, n-[:InLocationRelation]->l, n-[:InCategoryRelation]->c WITH currentUser, targetUser,n, l,c, contact.fav is not null as inFavorites MATCH n<-[followers?:InContactsRelation]-() WITH currentUser, targetUser,n, l,c,inFavorites, COUNT(followers) as numFollowers RETURN id(n) as id, n.name? as name, n.title? as title, n._class as _class, n.avatar? as avatar, n.avatar_type? as avatar_type, l.name as location__name, c.name as category__name, true as isInContacts, inFavorites as isInFavorites, numFollowers it runs in ~1s-3s(for first run) and ~1s-70ms (for consecutive and it depends on query) and there is about 5-10 queries runs for each impression. Another interesting behavior is when i try run query from console(neo4j) on my local machine many consecutive times(just press ctrl+enter for few seconds) it has almost constant execution time but when i do it on server it goes slower exponentially and i guess it somehow related with my problem. Problem: So my problem is that neo4j is very CPU greedy(for 24 core machine its may be not an issue but its obviously overkill for small project). First time i used AWS EC2 m1.large instance but over all performance was bad, during testing, CPU always was over 100%. Some relevant parts of configuration: neostore.nodestore.db.mapped_memory=1280M wrapper.java.maxmemory=8192 note: I already tried configuration where all memory related parameters where HIGH and it didn't worked(no change at all). Question: Where to digg? configuration? scheme? queries? what i'm doing wrong? if need more info(logs, configs) just ask ;)

Read the article
Improving performance on data pasting 2000 rows with validations

- by Lohit

I have N rows (which could be nothing less than 1000) on an excel spreadsheet. And in this sheet our project has 150 columns like this: Now, our application needs data to be copied (using normal Ctrl+C) and pasted (using Ctrl+V) from the excel file sheet on our GUI sheet. Copy pasting 1000 records takes around 5-6 seconds which is okay for our requirement, but the problem is when we need to make sure the data entered is valid. So we have to validate data in each row generate appropriate error messages and format the data as per requirement. So we need to at runtime parse and evaluate data in each row. Now all the formatting of data and validations come from the back-end database and we have it in a data-table (dtValidateAndFormatConditions). The conditions would be around 50. So you can see how slow this whole process becomes since N X 150 X 50 operations are required to complete this whole process. Initially it took approximately 2-3 minutes but now i have reduced it to 20 - 30 seconds. However i have increased the speed by making an expression parser of my own - and not by any algorithm, is there any other way i can improve performance, by using Divide and Conquer or some other mechanism. Currently i am not really sure how to go about this. Here is what part of my code looks like: public virtual void ValidateAndFormatOnCopyPaste(DataTable DtCopied, int CurRow) { foreach (DataRow dRow in dtValidateAndFormatConditions.Rows) { string Condition = dRow["Condition"]; string FormatValue = Value = dRow["Value"]; GetValidatedFormattedData(DtCopied,ref Condition, ref FormatValue ,iRowIndex); Condition = Parse(Condition); dRow["Condition"] = Condition; FormatValue = Parse(FormatValue ); dRow["Value"] = FormatValue; } } The above code gets called row-wise like this: public override void ValidateAndFormat(DataTable dtChangedRecords, CellRange cr) { int iRowStart = cr.Row, iRowEnd = cr.Row + cr.RowCount; for (int iRow = iRowStart; iRow < iRowEnd; iRow++) { ValidateAndFormatOnCopyPaste(dtChangedRecords,iRow); } } Please know my question needs a more algorithmic solution than code optimization, however any answers containing code related optimizations will be appreciated as well. (Tagged Linq because although not seen i have been using linq in some parts of my code).

Read the article
Is Linq Faster, Slower or the same?

- by Vaccano

Is this: Box boxToFind = AllBoxes.Where(box => box.BoxNumber == boxToMatchTo.BagNumber); Faster or slower than this: Box boxToFind ; foreach (Box box in AllBoxes) { if (box.BoxNumber == boxToMatchTo.BoxNumber) { boxToFind = box; } } Both give me the result I am looking for (boxToFind). This is going to run on a mobile device that I need to be performance conscientious of.

Read the article
SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

- by Pinal Dave

Update: This blog post is written based on the SafePeak, which is available for free download. Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications. I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems. To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”. Basic Caching Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache. As you can tell, this approach has its limitations. Dynamic Caching Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of: DML commands (update/insert/delete), indirect modifications due to triggers on other tables, executions of stored procedures with internal DML commands complex cases of stored procedures with multiple levels of internal stored procedures logic. When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands. Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper. SafePeak: A comprehensive and versatile caching platform SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for a wide range of business cases and applications. Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures; Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation; Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs); Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases; Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching; Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation); Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting; Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load); Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight); Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment; Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure; Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access; Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data; Cloud Ready: Available for instant deployment on Amazon Web Services (AWS). As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started. You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Loading PNGs into OpenGL performance issues - Java & JOGL much slower than C# & Tao.OpenGL

- by Edward Cresswell

I am noticing a large performance difference between Java & JOGL and C# & Tao.OpenGL when both loading PNGs from storage into memory, and when loading that BufferedImage (java) or Bitmap (C# - both are PNGs on hard drive) 'into' OpenGL. This difference is quite large, so I assumed I was doing something wrong, however after quite a lot of searching and trying different loading techniques I've been unable to reduce this difference. With Java I get an image loaded in 248ms and loaded into OpenGL in 728ms The same on C# takes 54ms to load the image, and 34ms to load/create texture. The image in question above is a PNG containing transparency, sized 7200x255, used for a 2D animated sprite. I realise the size is really quite ridiculous and am considering cutting up the sprite, however the large difference is still there (and confusing). On the Java side the code looks like this: BufferedImage image = ImageIO.read(new File(fileName)); texture = TextureIO.newTexture(image, false); texture.setTexParameteri(GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR); texture.setTexParameteri(GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR); The C# code uses: Bitmap t = new Bitmap(fileName); t.RotateFlip(RotateFlipType.RotateNoneFlipY); Rectangle r = new Rectangle(0, 0, t.Width, t.Height); BitmapData bd = t.LockBits(r, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb); Gl.glBindTexture(Gl.GL_TEXTURE_2D, tID); Gl.glTexImage2D(Gl.GL_TEXTURE_2D, 0, Gl.GL_RGBA, t.Width, t.Height, 0, Gl.GL_BGRA, Gl.GL_UNSIGNED_BYTE, bd.Scan0); Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MIN_FILTER, Gl.GL_LINEAR); Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MAG_FILTER, Gl.GL_LINEAR); t.UnlockBits(bd); t.Dispose(); After quite a lot of testing I can only come to the conclusion that Java/JOGL is just slower here - PNG reading might not be as quick, or that I'm still doing something wrong. Thanks. Edit2: I have found that creating a new BufferedImage with format TYPE_INT_ARGB_PRE decreases OpenGL texture load time by almost half - this includes having to create the new BufferedImage, getting the Graphics2D from it and then rendering the previously loaded image to it. Edit3: Benchmark results for 5 variations. I wrote a small benchmarking tool, the following results come from loading a set of 33 pngs, most are very wide, 5 times. testStart: ImageIO.read(file) -> TextureIO.newTexture(image) result: avg = 10250ms, total = 51251 testStart: ImageIO.read(bis) -> TextureIO.newTexture(image) result: avg = 10029ms, total = 50147 testStart: ImageIO.read(file) -> TextureIO.newTexture(argbImage) result: avg = 5343ms, total = 26717 testStart: ImageIO.read(bis) -> TextureIO.newTexture(argbImage) result: avg = 5534ms, total = 27673 testStart: TextureIO.newTexture(file) result: avg = 10395ms, total = 51979 ImageIO.read(bis) refers to the technique described in James Branigan's answer below. argbImage refers to the technique described in my previous edit: img = ImageIO.read(file); argbImg = new BufferedImage(img.getWidth(), img.getHeight(), TYPE_INT_ARGB_PRE); g = argbImg.createGraphics(); g.drawImage(img, 0, 0, null); texture = TextureIO.newTexture(argbImg, false); Any more methods of loading (either images from file, or images to OpenGL) would be appreciated, I will update these benchmarks.

Read the article
Performance issues with repeatable loops as control part

- by djerry

Hey guys, In my application, i need to show made calls to the user. The user can arrange some filters, according to what they want to see. The problem is that i find it quite hard to filter the calls without losing performance. This is what i am using now : private void ProcessFilterChoice() { _filteredCalls = ServiceConnector.ServiceConnector.SingletonServiceConnector.Proxy.GetAllCalls().ToList(); if (cboOutgoingIncoming.SelectedIndex > -1) GetFilterPartOutgoingIncoming(); if (cboInternExtern.SelectedIndex > -1) GetFilterPartInternExtern(); if (cboDateFilter.SelectedIndex > -1) GetFilteredCallsByDate(); wbPdf.Source = null; btnPrint.Content = "Pdf preview"; } private void GetFilterPartOutgoingIncoming() { if (cboOutgoingIncoming.SelectedItem.ToString().Equals("Outgoing")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Caller.E164.Length > 4 || _filteredCalls[i].Caller.E164.Equals("0")) _filteredCalls.RemoveAt(i); } else if (cboOutgoingIncoming.SelectedItem.ToString().Equals("Incoming")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Called.E164.Length > 4 || _filteredCalls[i].Called.E164.Equals("0")) _filteredCalls.RemoveAt(i); } } private void GetFilterPartInternExtern() { if (cboInternExtern.SelectedItem.ToString().Equals("Intern")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Called.E164.Length > 4 || _filteredCalls[i].Caller.E164.Length > 4 || _filteredCalls[i].Caller.E164.Equals("0")) _filteredCalls.RemoveAt(i); } else if (cboInternExtern.SelectedItem.ToString().Equals("Extern")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if ((_filteredCalls[i].Called.E164.Length < 5 && _filteredCalls[i].Caller.E164.Length < 5) || _filteredCalls[i].Called.E164.Equals("0")) _filteredCalls.RemoveAt(i); } } private void GetFilteredCallsByDate() { DateTime period = DateTime.Now; switch (cboDateFilter.SelectedItem.ToString()) { case "Today": period = DateTime.Today; break; case "Last week": period = DateTime.Today.Subtract(new TimeSpan(7, 0, 0, 0)); break; case "Last month": period = DateTime.Today.AddMonths(-1); break; case "Last year": period = DateTime.Today.AddYears(-1); break; default: return; } for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Start < period) _filteredCalls.RemoveAt(i); } } _filtered calls is a list of "calls". Calls is a class that looks like this : [DataContract] public class Call { private User caller, called; private DateTime start, end; private string conferenceId; private int id; private bool isNew = false; [DataMember] public bool IsNew { get { return isNew; } set { isNew = value; } } [DataMember] public int Id { get { return id; } set { id = value; } } [DataMember] public string ConferenceId { get { return conferenceId; } set { conferenceId = value; } } [DataMember] public DateTime End { get { return end; } set { end = value; } } [DataMember] public DateTime Start { get { return start; } set { start = value; } } [DataMember] public User Called { get { return called; } set { called = value; } } [DataMember] public User Caller { get { return caller; } set { caller = value; } } Can anyone direct me to a better solution or make some suggestions.

Read the article
SQL query performance optimization (TimesTen)

- by Sergey Mikhanov

Hi community, I need some help with TimesTen DB query optimization. I made some measures with Java profiler and found the code section that takes most of the time (this code section executes the SQL query). What is strange that this query becomes expensive only for some specific input data. Here’s the example. We have two tables that we are querying, one represents the objects we want to fetch (T_PROFILEGROUP), another represents the many-to-many link from some other table (T_PROFILECONTEXT_PROFILEGROUPS). We are not querying linked table. These are the queries that I executed with DB profiler running (they are the same except for the ID): Command> select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; < 1169655247309537280 > < 1169655249792565248 > < 1464837997699399681 > 3 rows found. Command> select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; < 1169655247309537280 > 1 row found. This is what I have in the profiler: 12:14:31.147 1 SQL 2L 6C 10825P Preparing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272 12:14:31.147 2 SQL 4L 6C 10825P sbSqlCmdCompile ()(E): (Found already compiled version: refCount:01, bucket:47) cmdType:100, cmdNum:1146695. 12:14:31.147 3 SQL 4L 6C 10825P Opening: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.147 4 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.148 5 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.148 6 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.228 7 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.228 8 SQL 4L 6C 10825P Closing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:35.243 9 SQL 2L 6C 10825P Preparing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928 12:14:35.243 10 SQL 4L 6C 10825P sbSqlCmdCompile ()(E): (Found already compiled version: refCount:01, bucket:44) cmdType:100, cmdNum:1146697. 12:14:35.243 11 SQL 4L 6C 10825P Opening: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; 12:14:35.243 12 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; 12:14:35.243 13 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; 12:14:35.243 14 SQL 4L 6C 10825P Closing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; It’s clear that the first query took almost 100ms, while the second was executed instantly. It’s not about queries precompilation (the first one is precompiled too, as same queries happened earlier). We have DB indices for all columns used here: T_PROFILEGROUP.M_ID, T_PROFILECONTEXT_PROFILEGROUPS.M_ID_OID and T_PROFILECONTEXT_PROFILEGROUPS.M_ID_EID. My questions are: Why querying the same set of tables yields such a different performance for different parameters? Which indices are involved here? Is there any way to improve this simple query and/or the DB to make it faster? UPDATE: to give the feeling of size: Command> select count(*) from T_PROFILEGROUP; < 183840 > 1 row found. Command> select count(*) from T_PROFILECONTEXT_PROFILEGROUPS; < 2279104 > 1 row found.

Read the article
Performance experiences for running Windows 7 on a Thin-Client?

- by Peter Bernier

Has anyone else tried installing Windows 7 on thin-client hardware? I'd be very interested to hear about other people's experiences and what sort of hardware tweaks they had to do to get it to work. (Yes, I realize this is completely unsupported.. half the fun of playing with machines and beta/RC versions is trying out unsupported scenarios. :) ) I managed to get Windows 7 installed on a modified Wyse 9450 Thin-Client and while the performance isn't great, it is usable, particularly as an RDP workstation. Before installing 7, I added another 256Mb of ram (512 total), a 60G laptop hard-drive and a PCI videocard to the 9450 (this was in order to increase the supported screen resolution). I basically did this in order to see whether or not it was possible to get 7 installed on such minimal hardware, and see what the performance would be. For a 550Mhz processor, I was reasonably impressed. I've been using the machine for RDP for the last couple of days and it actually seems slightly snappier than the default Windows XP embedded install (although this is more likely the result of the extra hardware). I'll be running some more tests later on as I'm curious to see particularl whether the streaming video performance will improve. I'd love to hear about anyone's experiences getting 7 to work on extremely low-powered hardware. Particularly any sort of tweaks that you've discovered in order to increase performance..

Read the article
To what extent is size a factor in SSD performance?

- by artif

To what extent is the size of an SSD a factor in its performance? In my mind, correct me if I'm wrong, a bigger SSD should be, everything else being equal, faster than a smaller one. A bigger SSD would have more erase blocks and thus more leeway for the FTL (flash translation layer) to do garbage collection optimization. Also there would be more time before TRIM became necessary. I see on Wikipedia that it remarks that "The performance of the SSD can scale with the number of parallel NAND flash chips used in the device" so it seems throughput also increases significantly. Also many SSDs contain internal caches of some sort and presumably those caches are larger for correspondingly large SSDs. But supposing this effect exists, I would like a quantitative analysis. Does throughput increase linearly? How much is garbage collection impacted, if at all? Does latency stay the same? And so on. Would the performance of a 8 GB SSD be significantly different from, for example, an 80 GB SSD assuming both used high quality chips, controllers, etc? Are there any resources (webpages, research papers, presentations, books, etc) that discuss correlations between SSD performance (4 KB random write speed, latency, maximum sequential throughput, etc) and size? I realize this does not really sound like a programming question but it is relevant for what I'm working on (using flash for caching hard drive data) which does involve programming. If there is a better place to ask this question, eg a more hardware oriented site, what would that be? Something like the equivalent of stack overflow (or perhaps a forum) for in-depth questions on hardware interfaces, internals, etc would be appreciated.

Read the article
Performance experiences for running Windows 7 on a Thin-Client?

- by Peter Bernier

Has anyone else tried installing Windows 7 on thin-client hardware? I'd be very interested to hear about other people's experiences and what sort of hardware tweaks they had to do to get it to work. (Yes, I realize this is completely unsupported.. half the fun of playing with machines and beta/RC versions is trying out unsupported scenarios. :) ) I managed to get Windows 7 installed on a modified Wyse 9450 Thin-Client and while the performance isn't great, it is usable, particularly as an RDP workstation. Before installing 7, I added another 256Mb of ram (512 total), a 60G laptop hard-drive and a PCI videocard to the 9450 (this was in order to increase the supported screen resolution). I basically did this in order to see whether or not it was possible to get 7 installed on such minimal hardware, and see what the performance would be. For a 550Mhz processor, I was reasonably impressed. I've been using the machine for RDP for the last couple of days and it actually seems slightly snappier than the default Windows XP embedded install (although this is more likely the result of the extra hardware). I'll be running some more tests later on as I'm curious to see particularl whether the streaming video performance will improve. I'd love to hear about anyone's experiences getting 7 to work on extremely low-powered hardware. Particularly any sort of tweaks that you've discovered in order to increase performance..

Read the article
LINQ To objects: Quicker ideas?

- by SDReyes

Do you see a better approach to obtain and concatenate item.Number in a single string? Current: var numbers = new StringBuilder( ); // group is the result of a previous group by var basenumbers = group.Select( item => item.Number ); basenumbers.Aggregate ( numbers, ( res, element ) => res.AppendFormat( "{0:00}", element ) );

Read the article
Data in two databases, eager spool resulting in query

- by Valkyrie

I have two databases in SQL2k5: one that holds a large amount of static data (SQL Database 1) (never updated but frequently inserted into) and one that holds relational data (SQL Database 2) related to the static data. They're separated mainly because of corporate guidelines and business requirements: assume for the following problem that combining them is not practical. There are places in SQLDB2 that PKs in SQLDB1 are referenced; triggers control the referential integrity, since cross-database relationships are troublesome in SQL Server. BUT, because of the large amount of data in SQLDB1, I'm getting eager spools on queries that join from the Id in SQLDB2 that references the data in SQLDB1. (With me so far? Maybe an example will help:) SELECT t.Id, t.Name, t2.Company FROM SQLDB1.table t INNER JOIN SQLDB2.table t2 ON t.Id = t2.FKId This query results in a eager spool that's 84% of the load of the query; the table in SQLDB1 has 35M rows, so it's completely choking this query. I can't create a view on the table in SQLDB1 and use that as my FK/index; it doesn't want me to create a constraint based on a view. Anyone have any idea how I can fix this huge bottleneck? (Short of putting the static data in the first db: believe me, I've argued that one until I'm blue in the face to no avail.) Thanks! valkyrie Edit: also can't create an indexed view because you can't put schemabinding on a view that references a table outside the database where the view resides. Dang it.

Read the article
Oracle T4CPreparedStatement memory leaks?

- by Jay

A little background on the application that I am gonna talk about in the next few lines: XYZ is a data masking workbench eclipse RCP application: You give it a source table column, and a target table column, it would apply a trasformation (encryption/shuffling/etc) and copy the row data from source table to target table. Now, when I mask n tables at a time, n threads are launched by this app. Here is the issue: I have run into a production issue on first roll out of the above said app. Unfortunately, I don't have any logs to get to the root. However, I tried to run this app in test region and do a stress test. When I collected .hprof files and ran 'em through an analyzer (yourKit), I noticed that objects of oracle.jdbc.driver.T4CPreparedStatement was retaining heap. The analysis also tells me that one of my classes is holding a reference to this preparedstatement object and thereby, n threads have n such objects. T4CPreparedStatement seemed to have character arrays: lastBoundChars and bindChars each of size char[300000]. So, I researched a bit (google!), obtained ojdbc6.jar and tried decompiling T4CPreparedStatement. I see that T4CPreparedStatement extends OraclePreparedStatement, which dynamically manages array size of lastBoundChars and bindChars. So, my questions here are: Have you ever run into an issue like this? Do you know the significance of lastBoundChars / bindChars? I am new to profiling, so do you think I am not doing it correct? (I also ran the hprofs through MAT - and this was the main identified issue - so, I don't really think I could be wrong?) I have found something similar on the web here: http://forums.oracle.com/forums/thread.jspa?messageID=2860681 Appreciate your suggestions / advice.

Read the article
Best Practise for Stopwatch in multi processors machine?

- by Ahmed Said

I found a good question for measuring function performance, and the answers recommend to use Stopwatch as follows Stopwatch sw = new Stopwatch(); sw.Start(); //DoWork sw.Stop(); //take sw.Elapsed But is this valid if you are running under multi processors machine? the thread can be switched to another processor, can it? Also the same thing should be in Enviroment.TickCount. If the answer is yes should I wrap my code inside BeginThreadAffinity as follows Thread.BeginThreadAffinity(); Stopwatch sw = new Stopwatch(); sw.Start(); //DoWork sw.Stop(); //take sw.Elapsed Thread.EndThreadAffinity(); P.S The switching can occur over the thread level not only the processor level, for example if the function is running in another thread so the system can switch it to another processor, if that happens, will the Stopwatch be valid after this switching? I am not using Stopwatch for perfromance measurement only but also to simulate timer function using Thread.Sleep (to prevent call overlapping)

Read the article
How to scale MySQL with multiple machines?

- by erotsppa

I have a web app running LAMP. We recently have an increase in load and is now looking at solutions to scale. Scaling apache is pretty easy we are just going to have multiple multiple machines hosting it and round robin the incoming traffic. However, each instance of apache will talk with MySQL and eventually MySQL will be overloaded. How to scale MySQL across multiple machines in this setup? I have already looked at this but specifically we need the updates from the DB available immediately so I don't think replication is a good strategy here? Also hopefully this can be done with minimal code change. PS. We have around a 1:1 read-write ratio.

Read the article
Slow rendering of pages due to User Controls.

- by Rahul Soni

How would you troubleshoot a page that is rendering slowly in ASP.NET? This issue is happening on only specific pages with a few user controls. Other pages work fine. Tracing has clarified that the issue is happening between "Begin Render" and "End Render".

Read the article
Missing 'DomContentLoaded' and 'load' time information in Firebug's Net Panel.

- by stony_dreams

Hello, Firebug is awesome in reporting the relative time when an HTTP request was made with respect to the 'DomContentLoaded' and 'load' time. However, once the 'load' event occurs (seen by the red line on the timeline), the requests thereafter do not have any information about how later they occurred with respect to the two events. To confuse things, these requests (usually at the bottom of the timeline) appear to have started right at the beginning of the page load. Could somebody shed some light on what should i infer when i see such entries in the timeline which do not have information about the 'DomContentLoaded' and 'load' event times and appear to have occurred after the page load event, still net panel shows that they started at the beginning? Thanks!

Read the article
Very different I/O performance in C++ on Windows

- by Mr.Gate

Hi all, I'm a new user and my english is not so good so I hope to be clear. We're facing a performance problem using large files (1GB or more) expecially (as it seems) when you try to grow them in size. Anyway... to verify our sensations we tryed the following (on Win 7 64Bit, 4core, 8GB Ram, 32 bit code compiled with VC2008) a) Open an unexisting file. Write it from the beginning up to 1Gb in 1Mb slots. Now you have a 1Gb file. Now randomize 10000 positions within that file, seek to that position and write 50 bytes in each position, no matter what you write. Close the file and look at the results. Time to create the file is quite fast (about 0.3"), time to write 10000 times is fast all the same (about 0.03"). Very good, this is the beginnig. Now try something else... b) Open an unexisting file, seek to 1Gb-1byte and write just 1 byte. Now you have another 1Gb file. Follow the next steps exactly same way of case 'a', close the file and look at the results. Time to create the file is the faster you can imagine (about 0.00009") but write time is something you can't believe.... about 90"!!!!! b.1) Open an unexisting file, don't write any byte. Act as before, ramdomizing, seeking and writing, close the file and look at the result. Time to write is long all the same: about 90"!!!!! Ok... this is quite amazing. But there's more! c) Open again the file you crated in case 'a', don't truncate it... randomize again 10000 positions and act as before. You're fast as before, about 0,03" to write 10000 times. This sounds Ok... try another step. d) Now open the file you created in case 'b', don't truncate it... randomize again 10000 positions and act as before. You're slow again and again, but the time is reduced to... 45"!! Maybe, trying again, the time will reduce. I actually wonder why... Any Idea? The following is part of the code I used to test what I told in previuos cases (you'll have to change someting in order to have a clean compilation, I just cut & paste from some source code, sorry). The sample can read and write, in random, ordered or reverse ordered mode, but write only in random order is the clearest test. We tryed using std::fstream but also using directly CreateFile(), WriteFile() and so on the results are the same (even if std::fstream is actually a little slower). Parameters for case 'a' = -f_tempdir_\casea.dat -n10000 -t -p -w Parameters for case 'b' = -f_tempdir_\caseb.dat -n10000 -t -v -w Parameters for case 'b.1' = -f_tempdir_\caseb.dat -n10000 -t -w Parameters for case 'c' = -f_tempdir_\casea.dat -n10000 -w Parameters for case 'd' = -f_tempdir_\caseb.dat -n10000 -w Run the test (and even others) and see... // iotest.cpp : Defines the entry point for the console application. // #include <windows.h> #include <iostream> #include <set> #include <vector> #include "stdafx.h" double RealTime_Microsecs() { LARGE_INTEGER fr = {0, 0}; LARGE_INTEGER ti = {0, 0}; double time = 0.0; QueryPerformanceCounter(&ti); QueryPerformanceFrequency(&fr); time = (double) ti.QuadPart / (double) fr.QuadPart; return time; } int main(int argc, char* argv[]) { std::string sFileName ; size_t stSize, stTimes, stBytes ; int retval = 0 ; char *p = NULL ; char *pPattern = NULL ; char *pReadBuf = NULL ; try { // Default stSize = 1<<30 ; // 1Gb stTimes = 1000 ; stBytes = 50 ; bool bTruncate = false ; bool bPre = false ; bool bPreFast = false ; bool bOrdered = false ; bool bReverse = false ; bool bWriteOnly = false ; // Comsumo i parametri for(int index=1; index < argc; ++index) { if ( '-' != argv[index][0] ) throw ; switch(argv[index][1]) { case 'f': sFileName = argv[index]+2 ; break ; case 's': stSize = xw::str::strtol(argv[index]+2) ; break ; case 'n': stTimes = xw::str::strtol(argv[index]+2) ; break ; case 'b':stBytes = xw::str::strtol(argv[index]+2) ; break ; case 't': bTruncate = true ; break ; case 'p' : bPre = true, bPreFast = false ; break ; case 'v' : bPreFast = true, bPre = false ; break ; case 'o' : bOrdered = true, bReverse = false ; break ; case 'r' : bReverse = true, bOrdered = false ; break ; case 'w' : bWriteOnly = true ; break ; default: throw ; break ; } } if ( sFileName.empty() ) { std::cout << "Usage: -f<File Name> -s<File Size> -n<Number of Reads and Writes> -b<Bytes per Read and Write> -t -p -v -o -r -w" << std::endl ; std::cout << "-t truncates the file, -p pre load the file, -v pre load 'veloce', -o writes in order mode, -r write in reverse order mode, -w Write Only" << std::endl ; std::cout << "Default: 1Gb, 1000 times, 50 bytes" << std::endl ; throw ; } if ( !stSize || !stTimes || !stBytes ) { std::cout << "Invalid Parameters" << std::endl ; return -1 ; } size_t stBestSize = 0x00100000 ; std::fstream fFile ; fFile.open(sFileName.c_str(), std::ios_base::binary|std::ios_base::out|std::ios_base::in|(bTruncate?std::ios_base::trunc:0)) ; p = new char[stBestSize] ; pPattern = new char[stBytes] ; pReadBuf = new char[stBytes] ; memset(p, 0, stBestSize) ; memset(pPattern, (int)(stBytes&0x000000ff), stBytes) ; double dTime = RealTime_Microsecs() ; size_t stCopySize, stSizeToCopy = stSize ; if ( bPre ) { do { stCopySize = std::min(stSizeToCopy, stBestSize) ; fFile.write(p, stCopySize) ; stSizeToCopy -= stCopySize ; } while (stSizeToCopy) ; std::cout << "Creating time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } else if ( bPreFast ) { fFile.seekp(stSize-1) ; fFile.write(p, 1) ; std::cout << "Creating Fast time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } size_t stPos ; ::srand((unsigned int)dTime) ; double dReadTime, dWriteTime ; stCopySize = stTimes ; std::vector<size_t> inVect ; std::vector<size_t> outVect ; std::set<size_t> outSet ; std::set<size_t> inSet ; // Prepare vector and set do { stPos = (size_t)(::rand()<<16) % stSize ; outVect.push_back(stPos) ; outSet.insert(stPos) ; stPos = (size_t)(::rand()<<16) % stSize ; inVect.push_back(stPos) ; inSet.insert(stPos) ; } while (--stCopySize) ; // Write & read using vectors if ( !bReverse && !bOrdered ) { std::vector<size_t>::iterator outI, inI ; outI = outVect.begin() ; inI = inVect.begin() ; stCopySize = stTimes ; dReadTime = 0.0 ; dWriteTime = 0.0 ; do { dTime = RealTime_Microsecs() ; fFile.seekp(*outI) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++outI ; if ( !bWriteOnly ) { dTime = RealTime_Microsecs() ; fFile.seekg(*inI) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++inI ; } } while (--stCopySize) ; std::cout << "Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " (Ave: " << xw::str::itoa(dWriteTime/stTimes, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { std::cout << "Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " (Ave: " << xw::str::itoa(dReadTime/stTimes, 10, 'f') << ")" << std::endl ; } } // End // Write in order if ( bOrdered ) { std::set<size_t>::iterator i = outSet.begin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.begin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End // Write in reverse order if ( bReverse ) { std::set<size_t>::reverse_iterator i = outSet.rbegin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.rbegin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End dTime = RealTime_Microsecs() ; fFile.close() ; std::cout << "Flush/Close Time is " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; std::cout << "Program Terminated" << std::endl ; } catch(...) { std::cout << "Something wrong or wrong parameters" << std::endl ; retval = -1 ; } if ( p ) delete []p ; if ( pPattern ) delete []pPattern ; if ( pReadBuf ) delete []pReadBuf ; return retval ; }

Read the article
How well does Scala Perform Comapred to Java?

- by Teja Kantamneni

The Question actually says it all. The reason behind this question is I am about to start a small side project and want to do it in Scala. I am learning scala for the past one month and now I am comfortable working with it. The scala compiler itself is pretty slow (unless you use fsc). So how well does it perform on JVM? I previously worked on groovy and I had seen sometimes over performed than java. My Question is how well scala perform on JVM compared to Java. I know scala has some very good features(FP, dynamic lang, statically typed...) but end of the day we need the performance...

Read the article
Approximate timings for various operations on a "typical desktop PC" anno 2010

- by knorv

In the article "Teach Yourself Programming in Ten Years" Peter Norvig (Director of Research, Google) gives the following approximate timings for various operations on a typical 1GHz PC back in 2001: execute single instruction = 1 nanosec = (1/1,000,000,000) sec fetch word from L1 cache memory = 2 nanosec fetch word from main memory = 10 nanosec fetch word from consecutive disk location = 200 nanosec fetch word from new disk location (seek) = 8,000,000 nanosec = 8 millisec What would the corresponding timings be for your definition of a typical PC desktop anno 2010?

Read the article
Very different IO performance in C/C++

- by Roberto Tirabassi

Hi all, I'm a new user and my english is not so good so I hope to be clear. We're facing a performance problem using large files (1GB or more) expecially (as it seems) when you try to grow them in size. Anyway... to verify our sensations we tryed the following (on Win 7 64Bit, 4core, 8GB Ram, 32 bit code compiled with VC2008) a) Open an unexisting file. Write it from the beginning up to 1Gb in 1Mb slots. Now you have a 1Gb file. Now randomize 10000 positions within that file, seek to that position and write 50 bytes in each position, no matter what you write. Close the file and look at the results. Time to create the file is quite fast (about 0.3"), time to write 10000 times is fast all the same (about 0.03"). Very good, this is the beginnig. Now try something else... b) Open an unexisting file, seek to 1Gb-1byte and write just 1 byte. Now you have another 1Gb file. Follow the next steps exactly same way of case 'a', close the file and look at the results. Time to create the file is the faster you can imagine (about 0.00009") but write time is something you can't believe.... about 90"!!!!! b.1) Open an unexisting file, don't write any byte. Act as before, ramdomizing, seeking and writing, close the file and look at the result. Time to write is long all the same: about 90"!!!!! Ok... this is quite amazing. But there's more! c) Open again the file you crated in case 'a', don't truncate it... randomize again 10000 positions and act as before. You're fast as before, about 0,03" to write 10000 times. This sounds Ok... try another step. d) Now open the file you created in case 'b', don't truncate it... randomize again 10000 positions and act as before. You're slow again and again, but the time is reduced to... 45"!! Maybe, trying again, the time will reduce. I actually wonder why... Any Idea? The following is part of the code I used to test what I told in previuos cases (you'll have to change someting in order to have a clean compilation, I just cut & paste from some source code, sorry). The sample can read and write, in random, ordered or reverse ordered mode, but write only in random order is the clearest test. We tryed using std::fstream but also using directly CreateFile(), WriteFile() and so on the results are the same (even if std::fstream is actually a little slower). Parameters for case 'a' = -f_tempdir_\casea.dat -n10000 -t -p -w Parameters for case 'b' = -f_tempdir_\caseb.dat -n10000 -t -v -w Parameters for case 'b.1' = -f_tempdir_\caseb.dat -n10000 -t -w Parameters for case 'c' = -f_tempdir_\casea.dat -n10000 -w Parameters for case 'd' = -f_tempdir_\caseb.dat -n10000 -w Run the test (and even others) and see... // iotest.cpp : Defines the entry point for the console application. // #include <windows.h> #include <iostream> #include <set> #include <vector> #include "stdafx.h" double RealTime_Microsecs() { LARGE_INTEGER fr = {0, 0}; LARGE_INTEGER ti = {0, 0}; double time = 0.0; QueryPerformanceCounter(&ti); QueryPerformanceFrequency(&fr); time = (double) ti.QuadPart / (double) fr.QuadPart; return time; } int main(int argc, char* argv[]) { std::string sFileName ; size_t stSize, stTimes, stBytes ; int retval = 0 ; char *p = NULL ; char *pPattern = NULL ; char *pReadBuf = NULL ; try { // Default stSize = 1<<30 ; // 1Gb stTimes = 1000 ; stBytes = 50 ; bool bTruncate = false ; bool bPre = false ; bool bPreFast = false ; bool bOrdered = false ; bool bReverse = false ; bool bWriteOnly = false ; // Comsumo i parametri for(int index=1; index < argc; ++index) { if ( '-' != argv[index][0] ) throw ; switch(argv[index][1]) { case 'f': sFileName = argv[index]+2 ; break ; case 's': stSize = xw::str::strtol(argv[index]+2) ; break ; case 'n': stTimes = xw::str::strtol(argv[index]+2) ; break ; case 'b':stBytes = xw::str::strtol(argv[index]+2) ; break ; case 't': bTruncate = true ; break ; case 'p' : bPre = true, bPreFast = false ; break ; case 'v' : bPreFast = true, bPre = false ; break ; case 'o' : bOrdered = true, bReverse = false ; break ; case 'r' : bReverse = true, bOrdered = false ; break ; case 'w' : bWriteOnly = true ; break ; default: throw ; break ; } } if ( sFileName.empty() ) { std::cout << "Usage: -f<File Name> -s<File Size> -n<Number of Reads and Writes> -b<Bytes per Read and Write> -t -p -v -o -r -w" << std::endl ; std::cout << "-t truncates the file, -p pre load the file, -v pre load 'veloce', -o writes in order mode, -r write in reverse order mode, -w Write Only" << std::endl ; std::cout << "Default: 1Gb, 1000 times, 50 bytes" << std::endl ; throw ; } if ( !stSize || !stTimes || !stBytes ) { std::cout << "Invalid Parameters" << std::endl ; return -1 ; } size_t stBestSize = 0x00100000 ; std::fstream fFile ; fFile.open(sFileName.c_str(), std::ios_base::binary|std::ios_base::out|std::ios_base::in|(bTruncate?std::ios_base::trunc:0)) ; p = new char[stBestSize] ; pPattern = new char[stBytes] ; pReadBuf = new char[stBytes] ; memset(p, 0, stBestSize) ; memset(pPattern, (int)(stBytes&0x000000ff), stBytes) ; double dTime = RealTime_Microsecs() ; size_t stCopySize, stSizeToCopy = stSize ; if ( bPre ) { do { stCopySize = std::min(stSizeToCopy, stBestSize) ; fFile.write(p, stCopySize) ; stSizeToCopy -= stCopySize ; } while (stSizeToCopy) ; std::cout << "Creating time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } else if ( bPreFast ) { fFile.seekp(stSize-1) ; fFile.write(p, 1) ; std::cout << "Creating Fast time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } size_t stPos ; ::srand((unsigned int)dTime) ; double dReadTime, dWriteTime ; stCopySize = stTimes ; std::vector<size_t> inVect ; std::vector<size_t> outVect ; std::set<size_t> outSet ; std::set<size_t> inSet ; // Prepare vector and set do { stPos = (size_t)(::rand()<<16) % stSize ; outVect.push_back(stPos) ; outSet.insert(stPos) ; stPos = (size_t)(::rand()<<16) % stSize ; inVect.push_back(stPos) ; inSet.insert(stPos) ; } while (--stCopySize) ; // Write & read using vectors if ( !bReverse && !bOrdered ) { std::vector<size_t>::iterator outI, inI ; outI = outVect.begin() ; inI = inVect.begin() ; stCopySize = stTimes ; dReadTime = 0.0 ; dWriteTime = 0.0 ; do { dTime = RealTime_Microsecs() ; fFile.seekp(*outI) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++outI ; if ( !bWriteOnly ) { dTime = RealTime_Microsecs() ; fFile.seekg(*inI) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++inI ; } } while (--stCopySize) ; std::cout << "Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " (Ave: " << xw::str::itoa(dWriteTime/stTimes, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { std::cout << "Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " (Ave: " << xw::str::itoa(dReadTime/stTimes, 10, 'f') << ")" << std::endl ; } } // End // Write in order if ( bOrdered ) { std::set<size_t>::iterator i = outSet.begin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.begin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End // Write in reverse order if ( bReverse ) { std::set<size_t>::reverse_iterator i = outSet.rbegin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.rbegin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End dTime = RealTime_Microsecs() ; fFile.close() ; std::cout << "Flush/Close Time is " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; std::cout << "Program Terminated" << std::endl ; } catch(...) { std::cout << "Something wrong or wrong parameters" << std::endl ; retval = -1 ; } if ( p ) delete []p ; if ( pPattern ) delete []pPattern ; if ( pReadBuf ) delete []pReadBuf ; return retval ; }

Read the article
Is count(*) really expensive ?

- by Anil Namde

I have a page where I have 4 tabs displaying 4 different reports based off different tables. I obtain the row count of each table using a select count(*) from <table> query and display number of rows available in each table on the tabs. As a result, each page postback causes 5 count(*) queries to be executed (4 to get counts and 1 for pagination) and 1 query for getting the report content. Now my question is: are count(*) queries really expensive -- should I keep the row counts (at least those that are displayed on the tab) in the view state of page instead of querying multiple times? How expensive are COUNT(*) queries ?

Read the article
How optimize queries with fully qualified names in t-sql?

- by tomaszs

Whe I call: select * from Database.dbo.Table where NAME = 'cat' It takes: 200 ms And when I change database to Database in Management Studio and call it without fully qualified name it's much faster: select * from Table where NAME = 'cat' It takes: 17 ms Is there any way to make fully qualified queries faster without changing database?

Read the article

< Previous Page | 34 35 36 37 38 39 40 41 42 43 44 45 | Next Page >