Search Results

Search found 13206 results on 529 pages for 'performance measurement'.

Page 31/529 | < Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38 | Next Page >

Performance of inter-database query (between linked servers)

- by Swoosh

I have an import between 2 linked servers. I basically got to get the data from a multiple join into a table on my side. The current query is something like this: select a.* from db1.dbo.tbl1 a inner join db1.dbo.tbl2 on ... inner join db1.dbo.tbl3 on ... inner join db1.dbo.tbl4 on ... inner join db2.dbo.myside on ... db1 = linked server db2 = my own database After this one, I am using an insert into + select to add this data in my table which is located in db2. (usually few hundred records - this import running once a minute) My question is related to performance. The tables on the linked server (tbl1, tbl2, tbl3, tbl4) are huge tables, with millions of records, and it is slowing down the import process. I was told that, if I do the join on the "other" side (db1 - linked server) for example in a stored procedure, than, even if the query looks the same, it would run faster. Is that right? This is kinda hard to test. Note that the join contains a table from my database too. Also. are there other "tricks" I could use in order to make this run faster? Thanks

Read the article
Performance when querying a View

- by Nate Bross

I'm wondering if this is a bad practice or if in general this is the correct approach. Lets say that I've created a view that combines a few attributes from a few tables. My question, what do I need to do so I can query against this view as if it were a table without worrying about performance? All attributes in the original tables are indexed, my concern is that the result view will have hundreds of thousands of records, which I will want to narrow down quite a bit based on user input. What I'd like to avoid, is having multiple versions of the code that generates this view floating around with a few extra "where" conditions to facilitate the user input filtering. For example, assume my view has this header VIEW(Name, Type, DateEntered) this may have 100,000+ rows (possibly millions). I'd like to be able to make this view in SQL Server, and then in my application write querlies like this: SELECT Name, Type, DateEntered FROM MyView WHERE DateEntered BETWEEN @date1 and @date2; Basically, I am denormalizing my data for a series of reports that need to be run, and I'd like to centralize where I pull the data from, maybe I'm not looking at this problem from the right angle though, so I'm open to alternative ways to attack this.

Read the article
Determining Best Table Structure for MySQL Performance

- by Joe Majewski

I'm working on a browser-based RPG for one of my websites, and right now I'm trying to determine the best way to organize my SQL tables for performance and maintenance. Here's my question: Does the number of columns in an SQL table affect the speed in which it can be queried? I am not a newbie when it comes to PHP or MySQL. I used to develop things with the common goal of getting them to work, but I've recently advanced to the stage where a functional program is not good enough unless it's fast and reliable. Anyways, right now I have a members table that has around 15 columns. It contains information such as the player's username, password, email, logins, page views, etcetera. It doesn't contain any information on the player's progress in the game, however. If I added columns for things such as army size, gold, turns, and whatnot, then it could easily rise to around 40 or 50 total columns. Oh, and my database structure IS normalized. Will a table with 50 columns that gets constantly queried be a bad idea? Should I split it into two tables; one for the user's general information and one for the user's game statistics? I know I could check the query time myself, but I haven't actually created the tables yet and I think I'd be better off with some professional advice on this important decision for my game. Thank you for your time! :)

Read the article
Performance issues when using SSD for a developer notebook (WAMP/LAMP stack)?

- by András Szepesházi

I'm a web application developer using my notebook as a standalone development environment (WAMP stack). I just switched from a Core2-duo Vista 32 bit notebook with 2Gb RAM and SATA HDD, to an i5-2520M Win7 64 bit with 4Gb RAM and 128 GB SDD (Corsair P3 128). My initial experience was what I expected, fast boot, quick load of all the applications (Eclipse takes now 5 seconds as opposed to 30s on my old notebook), overall great experience. Then I started to build up my development stack, both as LAMP (using VirtualBox with a debian guest) and WAMP (windows native apache + mysql + php). I wanted to compare those two. This still all worked great out, then I started to pull in my projects to these stacks. And here came the nasty surprise, one of those projects produced a lot worse response times than on my old notebook (that was true for both the VirtualBox and WAMP stack). Apache, php and mysql configurations were practically identical in all environments. I started to do a lot of benchmarking and profiling, and here is what I've found: All general benchmarks (Performance Test 7.0, HDTune Pro, wPrime2 and some more) gave a big advantage to the new notebook. Nothing surprising here. Disc specific tests showed that read/write operations peaked around 380M/160M for the SSD, and all the different sized block operations also performed very well. Started apache performance benchmarking with Apache Benchmark for a small static html file (10 concurrent threads, 500 iterations). Old notebook: min 47ms, median 111ms, max 156ms New WAMP stack: min 71ms, median 135ms, max 296ms New LAMP stack (in VirtualBox): min 6ms, median 46ms, max 175ms Right here I don't get why the native WAMP stack performed so bad, but at least the LAMP environment brought the expected speed. Apache performance measurement for non-cached php content. The php runs a loop of 1000 and generates sha1(uniqid()) inisde. Again, 10 concurrent threads, 500 iterations were used for the benchmark. Old notebook: min 0ms, median 39ms, max 218ms New WAMP stack: min 20ms, median 61ms, max 186ms New LAMP stack (in VirtualBox): min 124ms, median 704ms, max 2463ms What the hell? The new LAMP performed miserably, and even the new native WAMP was outperformed by the old notebook. php + mysql test. The test consists of connecting to a database and reading a single record form a table using INNER JOIN on 3 more (indexed) tables, repeated 100 times within a loop. Databases were identical. 10 concurrent threads, 100 iterations were used for the benchmark. Old notebook: min 1201ms, median 1734ms, max 3728ms New WAMP stack: min 367ms, median 675ms, max 1893ms New LAMP stack (in VirtualBox): min 1410ms, median 3659ms, max 5045ms And the same test with concurrency set to 1 (instead of 10): Old notebook: min 1201ms, median 1261ms, max 1357ms New WAMP stack: min 399ms, median 483ms, max 539ms New LAMP stack (in VirtualBox): min 285ms, median 348ms, max 444ms Strictly for my purposes, as I'm using a self contained development environment (= low concurrency) I could be satisfied with the second test's result. Though I have no idea why the VirtualBox environment performed so bad with higher concurrency. Finally I performed a test of including many php files. The application that I mentioned at the beginning, the one that was performing so bad, has a heavy bootstrap, loads hundreds of small library and configuration files while initializing. So this test does nothing else just includes about 100 files. Concurrency set to 1, 100 iterations: Old notebook: min 140ms, median 168ms, max 406ms New WAMP stack: min 434ms, median 488ms, max 604ms New LAMP stack (in VirtualBox): min 413ms, median 1040ms, max 1921ms Even if I consider that VirtualBox reached those files via shared folders, and that slows things down a bit, I still don't see how could the old notebook outperform so heavily both new configurations. And I think this is the real root of the slow performance, as the application uses even more includes, and the whole bootstrap will occur several times within a page request (for each ajax call, for example). To sum it up, here I am with a brand new high-performance notebook that loads the same page in 20 seconds, that my old notebook can do in 5-7 seconds. Needless to say, I'm not a very happy person right now. Why do you think I experience these poor performance values? What are my options to remedy this situation?

Read the article
How can you know what is w3wp.exe doing? (or how to diagnose a performance problem)

- by Daniel Magliola

I'm having a performance problem in a site we've made, and I'm not exactly sure how to start diagnosing it. The short description is: We have a very small site (http://hearablog.com) with very little traffic, in a crappy dedicated server, CPU is always very high, sometimes it stays at 100% for minutes, and w3wp.exe is taking most of it. A typical scenario is w3wp.exe takes 60%, and SQL Server takes about 30%. Our DB is pretty small too. Long description and more details: The site is hosted in a very crappy server by Cari.Net. From the beginning we had the feeling that the server didn't quite behave correctly, like some things would take just too long, so this could be a configuration problem from the get go. It may also be that we are getting a virtual server while we're supposed to have a dedicated one, although we have no evidence that'd indicate this, except for the fact that the server tends to be quite slow. The server is Windows 2008 Standard 64-bit, with SQL 2008 Express Hardware is a Celeron 2.80 GHz, 1Gb RAM The website is developed in ASP.Net MVC, using Entity Framework for data access. Now, this is pretty crappy hardware, but i've had other servers with these guys, with equivalent (or worse) HW, and performance is much better than this one. That said, the other servers have W2003 and SQL2005, and I'm using ASP.Net "WebForms" 2.0, no MVC, no LINQ, no EF; so I'm not sure whether going to 2008 / the other stuff means a big performance penalty is expected. I'm serving MP3 files (5-20 Mb) regularly, which is a slightly unusual load, maybe that is causing some kind of problems? Would that cause w3wp to use a lot of CPU? Disk usage seems very low. Memory is usually around 90%, but disk usage seems to indicate it's not paging much. I get tons of e-mails every day about SQL timeouts, for queries taking over 30 seconds, although all our queries are pretty straightforward (or should be, but EF may be screwing it up). This is what resource monitor looks like in one of these "sprints" of 100% CPU, in case there's anything useful there. And a snapshot of some performance counters: Now, what confuses me very much is that CPU usage of w3wp is just so high. It shouldn't be doing much really... So my questions are... Is there any way of finding out "what" it is doing? Maybe even profile it? Any performance counters I should be looking at? Is this to be expected given this hardware/software configuration? Is this could be cause by some kind of configuration failure, where would you start looking? Thank you VERY much. Daniel Magliola

Read the article
Is this slow WPF TextBlock performance expected?

- by Ben Schoepke

Hi, I am doing some benchmarking to determine if I can use WPF for a new product. However, early performance results are disappointing. I made a quick app that uses data binding to display a bunch of random text inside of a list box every 100 ms and it was eating up ~15% CPU. So I made another quick app that skipped the data binding/data template scheme and does nothing but update 10 TextBlocks that are inside of a ListBox every 100 ms (the actual product wouldn't require 100 ms updates, more like 500 ms max, but this is a stress test). I'm still seeing ~10-15% CPU usage. Why is this so high? Is it because of all the garbage strings? Here's the XAML: <Grid> <ListBox x:Name="numericsListBox"> <ListBox.Resources> <Style TargetType="TextBlock"> <Setter Property="FontSize" Value="48"/> <Setter Property="Width" Value="300"/> </Style> </ListBox.Resources> <TextBlock/> <TextBlock/> <TextBlock/> <TextBlock/> <TextBlock/> <TextBlock/> <TextBlock/> <TextBlock/> <TextBlock/> <TextBlock/> </ListBox> </Grid> Here's the code behind: public partial class Window1 : Window { private int _count = 0; public Window1() { InitializeComponent(); } private void OnLoad(object sender, RoutedEventArgs e) { var t = new DispatcherTimer(TimeSpan.FromSeconds(0.1), DispatcherPriority.Normal, UpdateNumerics, Dispatcher); t.Start(); } private void UpdateNumerics(object sender, EventArgs e) { ++_count; foreach (object textBlock in numericsListBox.Items) { var t = textBlock as TextBlock; if (t != null) t.Text = _count.ToString(); } } } Any ideas for a better way to quickly render text? My computer: XP SP3, 2.26 GHz Core 2 Duo, 4 GB RAM, Intel 4500 HD integrated graphics. And that is an order of magnitude beefier than the hardware I'd need to develop for in the real product.

Read the article
agent-based simulation: performance issue: Python vs NetLogo & Repast

- by max

I'm replicating a small piece of Sugarscape agent simulation model in Python 3. I found the performance of my code is ~3 times slower than that of NetLogo. Is it likely the problem with my code, or can it be the inherent limitation of Python? Obviously, this is just a fragment of the code, but that's where Python spends two-thirds of the run-time. I hope if I wrote something really inefficient it might show up in this fragment: UP = (0, -1) RIGHT = (1, 0) DOWN = (0, 1) LEFT = (-1, 0) all_directions = [UP, DOWN, RIGHT, LEFT] # point is just a tuple (x, y) def look_around(self): max_sugar_point = self.point max_sugar = self.world.sugar_map[self.point].level min_range = 0 random.shuffle(self.all_directions) for r in range(1, self.vision+1): for d in self.all_directions: p = ((self.point[0] + r * d[0]) % self.world.surface.length, (self.point[1] + r * d[1]) % self.world.surface.height) if self.world.occupied(p): # checks if p is in a lookup table (dict) continue if self.world.sugar_map[p].level > max_sugar: max_sugar = self.world.sugar_map[p].level max_sugar_point = p if max_sugar_point is not self.point: self.move(max_sugar_point) Roughly equivalent code in NetLogo (this fragment does a bit more than the Python function above): ; -- The SugarScape growth and motion procedures. -- to M ; Motion rule (page 25) locals [ps p v d] set ps (patches at-points neighborhood) with [count turtles-here = 0] if (count ps > 0) [ set v psugar-of max-one-of ps [psugar] ; v is max sugar w/in vision set ps ps with [psugar = v] ; ps is legal sites w/ v sugar set d distance min-one-of ps [distance myself] ; d is min dist from me to ps agents set p random-one-of ps with [distance myself = d] ; p is one of the min dist patches if (psugar >= v and includeMyPatch?) [set p patch-here] setxy pxcor-of p pycor-of p ; jump to p set sugar sugar + psugar-of p ; consume its sugar ask p [setpsugar 0] ; .. setting its sugar to 0 ] set sugar sugar - metabolism ; eat sugar (metabolism) set age age + 1 end On my computer, the Python code takes 15.5 sec to run 1000 steps; on the same laptop, the NetLogo simulation running in Java inside the browser finishes 1000 steps in less than 6 sec. EDIT: Just checked Repast, using Java implementation. And it's also about the same as NetLogo at 5.4 sec. Recent comparisons between Java and Python suggest no advantage to Java, so I guess it's just my code that's to blame? EDIT: I understand MASON is supposed to be even faster than Repast, and yet it still runs Java in the end.

Read the article
Performance of tokenizing CSS in PHP

- by Boldewyn

This is a noob question from someone who hasn't written a parser/lexer ever before. I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft). It's a list of 21 possible tokens, that all (but two) cannot be represented as static strings. My current approach is to loop through an array containing the 21 patterns over and over again, do an if (preg_match()) and reduce the source string match by match. In principle this works really good. However, for a 1000 lines CSS string this takes something between 2 and 8 seconds, which is too much for my project. Now I'm banging my head how other parsers tokenize and parse CSS in fractions of seconds. OK, C is always faster than PHP, but nonetheless, are there any obvious D'Oh! s that I fell into? I made some optimizations, like checking for '@', '#' or '"' as the first char of the remaining string and applying only the relevant regexp then, but this hadn't brought any great performance boosts. My code (snippet) so far: $TOKENS = array( 'IDENT' => '...regexp...', 'ATKEYWORD' => '@...regexp...', 'String' => '"...regexp..."|\'...regexp...\'', //... ); $string = '...CSS source string...'; $stream = array(); // we reduce $string token by token while ($string != '') { $string = ltrim($string, " \t\r\n\f"); // unconsumed whitespace at the // start is insignificant but doing a trim reduces exec time by 25% $matches = array(); // loop through all possible tokens foreach ($TOKENS as $t => $p) { // The '&' is used as delimiter, because it isn't used anywhere in // the token regexps if (preg_match('&^'.$p.'&Su', $string, $matches)) { $stream[] = array($t, $matches[0]); $string = substr($string, strlen($matches[0])); // Yay! We found one that matches! continue 2; } } // if we come here, we have a syntax error and handle it somehow } // result: an array $stream consisting of arrays with // 0 => type of token // 1 => token content

Read the article
Performance Optimization for Matrix Rotation

- by Summer_More_More_Tea

Hello everyone: I'm now trapped by a performance optimization lab in the book "Computer System from a Programmer's Perspective" described as following: In a N*N matrix M, where N is multiple of 32, the rotate operation can be represented as: Transpose: interchange elements M(i,j) and M(j,i) Exchange rows: Row i is exchanged with row N-1-i A example for matrix rotation(N is 3 instead of 32 for simplicity): ------- ------- |1|2|3| |3|6|9| ------- ------- |4|5|6| after rotate is |2|5|8| ------- ------- |7|8|9| |1|4|7| ------- ------- A naive implementation is: #define RIDX(i,j,n) ((i)*(n)+(j)) void naive_rotate(int dim, pixel *src, pixel *dst) { int i, j; for (i = 0; i < dim; i++) for (j = 0; j < dim; j++) dst[RIDX(dim-1-j, i, dim)] = src[RIDX(i, j, dim)]; } I come up with an idea by inner-loop-unroll. The result is: Code Version Speed Up original 1x unrolled by 2 1.33x unrolled by 4 1.33x unrolled by 8 1.55x unrolled by 16 1.67x unrolled by 32 1.61x I also get a code snippet from pastebin.com that seems can solve this problem: void rotate(int dim, pixel *src, pixel *dst) { int stride = 32; int count = dim >> 5; src += dim - 1; int a1 = count; do { int a2 = dim; do { int a3 = stride; do { *dst++ = *src; src += dim; } while(--a3); src -= dim * stride + 1; dst += dim - stride; } while(--a2); src += dim * (stride + 1); dst -= dim * dim - stride; } while(--a1); } After carefully read the code, I think main idea of this solution is treat 32 rows as a data zone, and perform the rotating operation respectively. Speed up of this version is 1.85x, overwhelming all the loop-unroll version. Here are the questions: In the inner-loop-unroll version, why does increment slow down if the unrolling factor increase, especially change the unrolling factor from 8 to 16, which does not effect the same when switch from 4 to 8? Does the result have some relationship with depth of the CPU pipeline? If the answer is yes, could the degrade of increment reflect pipeline length? What is the probable reason for the optimization of data-zone version? It seems that there is no too much essential difference from the original naive version. EDIT: My test environment is Intel Centrino Duo processor and the verion of gcc is 4.4 Any advice will be highly appreciated! Kind regards!

Read the article
Neo4j 1.9.4 (REST Server,CYPHER) performance issue

- by user2968943

I have Neo4j 1.9.4 installed on 24 core 24Gb ram (centos) machine and for most queries CPU usage spikes goes to 200% with only few concurrent requests. Domain: some sort of social application where few types of nodes(profiles) with 3-30 text/array properties and 36 relationship types with at least 3 properties. Most of nodes currently has ~300-500 relationships. Current data set footprint(from console): LogicalLogSize=4294907 (32MB) ArrayStoreSize=1675520 (12MB) NodeStoreSize=1342170 (10MB) PropertyStoreSize=1739548 (13MB) RelationshipStoreSize=6395202 (48MB) StringStoreSize=1478400 (11MB) which is IMHO really small. most queries looks like this one(with more or less WITH .. MATCH .. statements and few queries with variable length relations but the often fast): START targetUser=node({id}), currentUser=node({current}) MATCH targetUser-[contact:InContactsRelation]->n, n-[:InLocationRelation]->l, n-[:InCategoryRelation]->c WITH currentUser, targetUser,n, l,c, contact.fav is not null as inFavorites MATCH n<-[followers?:InContactsRelation]-() WITH currentUser, targetUser,n, l,c,inFavorites, COUNT(followers) as numFollowers RETURN id(n) as id, n.name? as name, n.title? as title, n._class as _class, n.avatar? as avatar, n.avatar_type? as avatar_type, l.name as location__name, c.name as category__name, true as isInContacts, inFavorites as isInFavorites, numFollowers it runs in ~1s-3s(for first run) and ~1s-70ms (for consecutive and it depends on query) and there is about 5-10 queries runs for each impression. Another interesting behavior is when i try run query from console(neo4j) on my local machine many consecutive times(just press ctrl+enter for few seconds) it has almost constant execution time but when i do it on server it goes slower exponentially and i guess it somehow related with my problem. Problem: So my problem is that neo4j is very CPU greedy(for 24 core machine its may be not an issue but its obviously overkill for small project). First time i used AWS EC2 m1.large instance but over all performance was bad, during testing, CPU always was over 100%. Some relevant parts of configuration: neostore.nodestore.db.mapped_memory=1280M wrapper.java.maxmemory=8192 note: I already tried configuration where all memory related parameters where HIGH and it didn't worked(no change at all). Question: Where to digg? configuration? scheme? queries? what i'm doing wrong? if need more info(logs, configs) just ask ;)

Read the article
Improving performance on data pasting 2000 rows with validations

- by Lohit

I have N rows (which could be nothing less than 1000) on an excel spreadsheet. And in this sheet our project has 150 columns like this: Now, our application needs data to be copied (using normal Ctrl+C) and pasted (using Ctrl+V) from the excel file sheet on our GUI sheet. Copy pasting 1000 records takes around 5-6 seconds which is okay for our requirement, but the problem is when we need to make sure the data entered is valid. So we have to validate data in each row generate appropriate error messages and format the data as per requirement. So we need to at runtime parse and evaluate data in each row. Now all the formatting of data and validations come from the back-end database and we have it in a data-table (dtValidateAndFormatConditions). The conditions would be around 50. So you can see how slow this whole process becomes since N X 150 X 50 operations are required to complete this whole process. Initially it took approximately 2-3 minutes but now i have reduced it to 20 - 30 seconds. However i have increased the speed by making an expression parser of my own - and not by any algorithm, is there any other way i can improve performance, by using Divide and Conquer or some other mechanism. Currently i am not really sure how to go about this. Here is what part of my code looks like: public virtual void ValidateAndFormatOnCopyPaste(DataTable DtCopied, int CurRow) { foreach (DataRow dRow in dtValidateAndFormatConditions.Rows) { string Condition = dRow["Condition"]; string FormatValue = Value = dRow["Value"]; GetValidatedFormattedData(DtCopied,ref Condition, ref FormatValue ,iRowIndex); Condition = Parse(Condition); dRow["Condition"] = Condition; FormatValue = Parse(FormatValue ); dRow["Value"] = FormatValue; } } The above code gets called row-wise like this: public override void ValidateAndFormat(DataTable dtChangedRecords, CellRange cr) { int iRowStart = cr.Row, iRowEnd = cr.Row + cr.RowCount; for (int iRow = iRowStart; iRow < iRowEnd; iRow++) { ValidateAndFormatOnCopyPaste(dtChangedRecords,iRow); } } Please know my question needs a more algorithmic solution than code optimization, however any answers containing code related optimizations will be appreciated as well. (Tagged Linq because although not seen i have been using linq in some parts of my code).

Read the article
Is Linq Faster, Slower or the same?

- by Vaccano

Is this: Box boxToFind = AllBoxes.Where(box => box.BoxNumber == boxToMatchTo.BagNumber); Faster or slower than this: Box boxToFind ; foreach (Box box in AllBoxes) { if (box.BoxNumber == boxToMatchTo.BoxNumber) { boxToFind = box; } } Both give me the result I am looking for (boxToFind). This is going to run on a mobile device that I need to be performance conscientious of.

Read the article
High accuracy cpu timers

- by John Robertson

An expert in highly optimized code once told me that an important part of his strategy was the availability of extremely high performance timers on the CPU. Does anyone know what those are and how one can access them to test various code optimizations? While I am interested regardless, I also wanted to ask whether it is possible to access them from something higher than assembly (or with only a little assembly) via visual studio C++?

Read the article
SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

- by Pinal Dave

Update: This blog post is written based on the SafePeak, which is available for free download. Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications. I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems. To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”. Basic Caching Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache. As you can tell, this approach has its limitations. Dynamic Caching Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of: DML commands (update/insert/delete), indirect modifications due to triggers on other tables, executions of stored procedures with internal DML commands complex cases of stored procedures with multiple levels of internal stored procedures logic. When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands. Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper. SafePeak: A comprehensive and versatile caching platform SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for a wide range of business cases and applications. Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures; Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation; Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs); Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases; Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching; Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation); Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting; Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load); Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight); Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment; Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure; Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access; Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data; Cloud Ready: Available for instant deployment on Amazon Web Services (AWS). As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started. You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Loading PNGs into OpenGL performance issues - Java & JOGL much slower than C# & Tao.OpenGL

- by Edward Cresswell

I am noticing a large performance difference between Java & JOGL and C# & Tao.OpenGL when both loading PNGs from storage into memory, and when loading that BufferedImage (java) or Bitmap (C# - both are PNGs on hard drive) 'into' OpenGL. This difference is quite large, so I assumed I was doing something wrong, however after quite a lot of searching and trying different loading techniques I've been unable to reduce this difference. With Java I get an image loaded in 248ms and loaded into OpenGL in 728ms The same on C# takes 54ms to load the image, and 34ms to load/create texture. The image in question above is a PNG containing transparency, sized 7200x255, used for a 2D animated sprite. I realise the size is really quite ridiculous and am considering cutting up the sprite, however the large difference is still there (and confusing). On the Java side the code looks like this: BufferedImage image = ImageIO.read(new File(fileName)); texture = TextureIO.newTexture(image, false); texture.setTexParameteri(GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR); texture.setTexParameteri(GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR); The C# code uses: Bitmap t = new Bitmap(fileName); t.RotateFlip(RotateFlipType.RotateNoneFlipY); Rectangle r = new Rectangle(0, 0, t.Width, t.Height); BitmapData bd = t.LockBits(r, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb); Gl.glBindTexture(Gl.GL_TEXTURE_2D, tID); Gl.glTexImage2D(Gl.GL_TEXTURE_2D, 0, Gl.GL_RGBA, t.Width, t.Height, 0, Gl.GL_BGRA, Gl.GL_UNSIGNED_BYTE, bd.Scan0); Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MIN_FILTER, Gl.GL_LINEAR); Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MAG_FILTER, Gl.GL_LINEAR); t.UnlockBits(bd); t.Dispose(); After quite a lot of testing I can only come to the conclusion that Java/JOGL is just slower here - PNG reading might not be as quick, or that I'm still doing something wrong. Thanks. Edit2: I have found that creating a new BufferedImage with format TYPE_INT_ARGB_PRE decreases OpenGL texture load time by almost half - this includes having to create the new BufferedImage, getting the Graphics2D from it and then rendering the previously loaded image to it. Edit3: Benchmark results for 5 variations. I wrote a small benchmarking tool, the following results come from loading a set of 33 pngs, most are very wide, 5 times. testStart: ImageIO.read(file) -> TextureIO.newTexture(image) result: avg = 10250ms, total = 51251 testStart: ImageIO.read(bis) -> TextureIO.newTexture(image) result: avg = 10029ms, total = 50147 testStart: ImageIO.read(file) -> TextureIO.newTexture(argbImage) result: avg = 5343ms, total = 26717 testStart: ImageIO.read(bis) -> TextureIO.newTexture(argbImage) result: avg = 5534ms, total = 27673 testStart: TextureIO.newTexture(file) result: avg = 10395ms, total = 51979 ImageIO.read(bis) refers to the technique described in James Branigan's answer below. argbImage refers to the technique described in my previous edit: img = ImageIO.read(file); argbImg = new BufferedImage(img.getWidth(), img.getHeight(), TYPE_INT_ARGB_PRE); g = argbImg.createGraphics(); g.drawImage(img, 0, 0, null); texture = TextureIO.newTexture(argbImg, false); Any more methods of loading (either images from file, or images to OpenGL) would be appreciated, I will update these benchmarks.

Read the article
measure rendered html in javascript without affecting the measurement

- by drawnonward

I am doing pagination in javascript. This is typographic pagination, not chopping up database results. For the most part it works, but I have run into a heisenberg issue where I cannot quite measure text without affecting it. I am not trying to measure text before it is rendered. I want the actual position it shows up at on screen, so I can paginate to where it is naturally wrapped. I am measuring the vertical position of characters, not the horizontal width of strings. The way I do this is similar to this answer in that I am applying a style to a block of text, then measuring the position of the newly created span. If the span does not reach the end of the page, I clear it and make a new span in a linear search. The problem is that the anti-aliased sub-pixel text layout is different when the span is applied. In rare cases, this causes the text to wrap differently when I measure it. I have only seen this when wrapping at a hyphen, and I assume it would not happen when wrapping at white space. As a concrete example, "prepared-he" is the string I am having trouble with. When I measure up to "prepare" it appears, as expected, to be within the current page. When I measure "prepared" the whole phrase wraps down to the next line, moving it to the next page, so it looks like the "d" is the character to break at. I break the text between "prepare" and "d-he" and that is wrong. Trying to evaluate individual characters opens a whole can of worms I would rather avoid. The wrapping changes because, with the new span, the line is 1 pixel wider. A solution to my problem could either be a better way to measure text using javascript, or a way to wrap text in a new element without affecting layout. I have tried setting margin-right:-1px for the class of the span being created to wrap the text. This had no noticeable effect. I am doing this in a UIWebView on the iPhone. There are some measurement related calls that are available in normal WebKit that are not available here. For example, Range does not have getBoundingClientRect or support setting an offset other than 0 in setStart or setEnd. Thank you

Read the article
Performance issues with repeatable loops as control part

- by djerry

Hey guys, In my application, i need to show made calls to the user. The user can arrange some filters, according to what they want to see. The problem is that i find it quite hard to filter the calls without losing performance. This is what i am using now : private void ProcessFilterChoice() { _filteredCalls = ServiceConnector.ServiceConnector.SingletonServiceConnector.Proxy.GetAllCalls().ToList(); if (cboOutgoingIncoming.SelectedIndex > -1) GetFilterPartOutgoingIncoming(); if (cboInternExtern.SelectedIndex > -1) GetFilterPartInternExtern(); if (cboDateFilter.SelectedIndex > -1) GetFilteredCallsByDate(); wbPdf.Source = null; btnPrint.Content = "Pdf preview"; } private void GetFilterPartOutgoingIncoming() { if (cboOutgoingIncoming.SelectedItem.ToString().Equals("Outgoing")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Caller.E164.Length > 4 || _filteredCalls[i].Caller.E164.Equals("0")) _filteredCalls.RemoveAt(i); } else if (cboOutgoingIncoming.SelectedItem.ToString().Equals("Incoming")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Called.E164.Length > 4 || _filteredCalls[i].Called.E164.Equals("0")) _filteredCalls.RemoveAt(i); } } private void GetFilterPartInternExtern() { if (cboInternExtern.SelectedItem.ToString().Equals("Intern")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Called.E164.Length > 4 || _filteredCalls[i].Caller.E164.Length > 4 || _filteredCalls[i].Caller.E164.Equals("0")) _filteredCalls.RemoveAt(i); } else if (cboInternExtern.SelectedItem.ToString().Equals("Extern")) for (int i = _filteredCalls.Count - 1; i > -1; i--) { if ((_filteredCalls[i].Called.E164.Length < 5 && _filteredCalls[i].Caller.E164.Length < 5) || _filteredCalls[i].Called.E164.Equals("0")) _filteredCalls.RemoveAt(i); } } private void GetFilteredCallsByDate() { DateTime period = DateTime.Now; switch (cboDateFilter.SelectedItem.ToString()) { case "Today": period = DateTime.Today; break; case "Last week": period = DateTime.Today.Subtract(new TimeSpan(7, 0, 0, 0)); break; case "Last month": period = DateTime.Today.AddMonths(-1); break; case "Last year": period = DateTime.Today.AddYears(-1); break; default: return; } for (int i = _filteredCalls.Count - 1; i > -1; i--) { if (_filteredCalls[i].Start < period) _filteredCalls.RemoveAt(i); } } _filtered calls is a list of "calls". Calls is a class that looks like this : [DataContract] public class Call { private User caller, called; private DateTime start, end; private string conferenceId; private int id; private bool isNew = false; [DataMember] public bool IsNew { get { return isNew; } set { isNew = value; } } [DataMember] public int Id { get { return id; } set { id = value; } } [DataMember] public string ConferenceId { get { return conferenceId; } set { conferenceId = value; } } [DataMember] public DateTime End { get { return end; } set { end = value; } } [DataMember] public DateTime Start { get { return start; } set { start = value; } } [DataMember] public User Called { get { return called; } set { called = value; } } [DataMember] public User Caller { get { return caller; } set { caller = value; } } Can anyone direct me to a better solution or make some suggestions.

Read the article
SQL query performance optimization (TimesTen)

- by Sergey Mikhanov

Hi community, I need some help with TimesTen DB query optimization. I made some measures with Java profiler and found the code section that takes most of the time (this code section executes the SQL query). What is strange that this query becomes expensive only for some specific input data. Here’s the example. We have two tables that we are querying, one represents the objects we want to fetch (T_PROFILEGROUP), another represents the many-to-many link from some other table (T_PROFILECONTEXT_PROFILEGROUPS). We are not querying linked table. These are the queries that I executed with DB profiler running (they are the same except for the ID): Command> select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; < 1169655247309537280 > < 1169655249792565248 > < 1464837997699399681 > 3 rows found. Command> select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; < 1169655247309537280 > 1 row found. This is what I have in the profiler: 12:14:31.147 1 SQL 2L 6C 10825P Preparing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272 12:14:31.147 2 SQL 4L 6C 10825P sbSqlCmdCompile ()(E): (Found already compiled version: refCount:01, bucket:47) cmdType:100, cmdNum:1146695. 12:14:31.147 3 SQL 4L 6C 10825P Opening: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.147 4 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.148 5 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.148 6 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.228 7 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:31.228 8 SQL 4L 6C 10825P Closing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1464837998949302272; 12:14:35.243 9 SQL 2L 6C 10825P Preparing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928 12:14:35.243 10 SQL 4L 6C 10825P sbSqlCmdCompile ()(E): (Found already compiled version: refCount:01, bucket:44) cmdType:100, cmdNum:1146697. 12:14:35.243 11 SQL 4L 6C 10825P Opening: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; 12:14:35.243 12 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; 12:14:35.243 13 SQL 4L 6C 10825P Fetching: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; 12:14:35.243 14 SQL 4L 6C 10825P Closing: select G.M_ID from T_PROFILECONTEXT_PROFILEGROUPS CG, T_PROFILEGROUP G where CG.M_ID_EID = G.M_ID and CG.M_ID_OID = 1466585677823868928; It’s clear that the first query took almost 100ms, while the second was executed instantly. It’s not about queries precompilation (the first one is precompiled too, as same queries happened earlier). We have DB indices for all columns used here: T_PROFILEGROUP.M_ID, T_PROFILECONTEXT_PROFILEGROUPS.M_ID_OID and T_PROFILECONTEXT_PROFILEGROUPS.M_ID_EID. My questions are: Why querying the same set of tables yields such a different performance for different parameters? Which indices are involved here? Is there any way to improve this simple query and/or the DB to make it faster? UPDATE: to give the feeling of size: Command> select count(*) from T_PROFILEGROUP; < 183840 > 1 row found. Command> select count(*) from T_PROFILECONTEXT_PROFILEGROUPS; < 2279104 > 1 row found.

Read the article
Performance experiences for running Windows 7 on a Thin-Client?

- by Peter Bernier

Has anyone else tried installing Windows 7 on thin-client hardware? I'd be very interested to hear about other people's experiences and what sort of hardware tweaks they had to do to get it to work. (Yes, I realize this is completely unsupported.. half the fun of playing with machines and beta/RC versions is trying out unsupported scenarios. :) ) I managed to get Windows 7 installed on a modified Wyse 9450 Thin-Client and while the performance isn't great, it is usable, particularly as an RDP workstation. Before installing 7, I added another 256Mb of ram (512 total), a 60G laptop hard-drive and a PCI videocard to the 9450 (this was in order to increase the supported screen resolution). I basically did this in order to see whether or not it was possible to get 7 installed on such minimal hardware, and see what the performance would be. For a 550Mhz processor, I was reasonably impressed. I've been using the machine for RDP for the last couple of days and it actually seems slightly snappier than the default Windows XP embedded install (although this is more likely the result of the extra hardware). I'll be running some more tests later on as I'm curious to see particularl whether the streaming video performance will improve. I'd love to hear about anyone's experiences getting 7 to work on extremely low-powered hardware. Particularly any sort of tweaks that you've discovered in order to increase performance..

Read the article
Performance experiences for running Windows 7 on a Thin-Client?

- by Peter Bernier

Has anyone else tried installing Windows 7 on thin-client hardware? I'd be very interested to hear about other people's experiences and what sort of hardware tweaks they had to do to get it to work. (Yes, I realize this is completely unsupported.. half the fun of playing with machines and beta/RC versions is trying out unsupported scenarios. :) ) I managed to get Windows 7 installed on a modified Wyse 9450 Thin-Client and while the performance isn't great, it is usable, particularly as an RDP workstation. Before installing 7, I added another 256Mb of ram (512 total), a 60G laptop hard-drive and a PCI videocard to the 9450 (this was in order to increase the supported screen resolution). I basically did this in order to see whether or not it was possible to get 7 installed on such minimal hardware, and see what the performance would be. For a 550Mhz processor, I was reasonably impressed. I've been using the machine for RDP for the last couple of days and it actually seems slightly snappier than the default Windows XP embedded install (although this is more likely the result of the extra hardware). I'll be running some more tests later on as I'm curious to see particularl whether the streaming video performance will improve. I'd love to hear about anyone's experiences getting 7 to work on extremely low-powered hardware. Particularly any sort of tweaks that you've discovered in order to increase performance..

Read the article
To what extent is size a factor in SSD performance?

- by artif

To what extent is the size of an SSD a factor in its performance? In my mind, correct me if I'm wrong, a bigger SSD should be, everything else being equal, faster than a smaller one. A bigger SSD would have more erase blocks and thus more leeway for the FTL (flash translation layer) to do garbage collection optimization. Also there would be more time before TRIM became necessary. I see on Wikipedia that it remarks that "The performance of the SSD can scale with the number of parallel NAND flash chips used in the device" so it seems throughput also increases significantly. Also many SSDs contain internal caches of some sort and presumably those caches are larger for correspondingly large SSDs. But supposing this effect exists, I would like a quantitative analysis. Does throughput increase linearly? How much is garbage collection impacted, if at all? Does latency stay the same? And so on. Would the performance of a 8 GB SSD be significantly different from, for example, an 80 GB SSD assuming both used high quality chips, controllers, etc? Are there any resources (webpages, research papers, presentations, books, etc) that discuss correlations between SSD performance (4 KB random write speed, latency, maximum sequential throughput, etc) and size? I realize this does not really sound like a programming question but it is relevant for what I'm working on (using flash for caching hard drive data) which does involve programming. If there is a better place to ask this question, eg a more hardware oriented site, what would that be? Something like the equivalent of stack overflow (or perhaps a forum) for in-depth questions on hardware interfaces, internals, etc would be appreciated.

Read the article
Why does Joomla debug show 446 queries logged and 446 legacy queries logged?

- by Darye

I have been called in to fix the performance of a Joomla site that was already setup. I look at the debug output and it shows the same queries twice, once for queries logged and again for legacy queries logged. My guess is that it is actually running the same queries twice make for just under 900 queries per page (hope I am wrong) The Legacy plugin is disabled, so Legacy mode is not on at all. The site uses VirtueMart as well (which BTW isn't working properly if the cache in the Global Config is turned on) Besides the fact that I don't think it should be running 446 queries anyway (sometimes even up to 650 per page ), has anyone every experienced this issue, and where would I look to fix this. Thanks

Read the article
Very different I/O performance in C++ on Windows

- by Mr.Gate

Hi all, I'm a new user and my english is not so good so I hope to be clear. We're facing a performance problem using large files (1GB or more) expecially (as it seems) when you try to grow them in size. Anyway... to verify our sensations we tryed the following (on Win 7 64Bit, 4core, 8GB Ram, 32 bit code compiled with VC2008) a) Open an unexisting file. Write it from the beginning up to 1Gb in 1Mb slots. Now you have a 1Gb file. Now randomize 10000 positions within that file, seek to that position and write 50 bytes in each position, no matter what you write. Close the file and look at the results. Time to create the file is quite fast (about 0.3"), time to write 10000 times is fast all the same (about 0.03"). Very good, this is the beginnig. Now try something else... b) Open an unexisting file, seek to 1Gb-1byte and write just 1 byte. Now you have another 1Gb file. Follow the next steps exactly same way of case 'a', close the file and look at the results. Time to create the file is the faster you can imagine (about 0.00009") but write time is something you can't believe.... about 90"!!!!! b.1) Open an unexisting file, don't write any byte. Act as before, ramdomizing, seeking and writing, close the file and look at the result. Time to write is long all the same: about 90"!!!!! Ok... this is quite amazing. But there's more! c) Open again the file you crated in case 'a', don't truncate it... randomize again 10000 positions and act as before. You're fast as before, about 0,03" to write 10000 times. This sounds Ok... try another step. d) Now open the file you created in case 'b', don't truncate it... randomize again 10000 positions and act as before. You're slow again and again, but the time is reduced to... 45"!! Maybe, trying again, the time will reduce. I actually wonder why... Any Idea? The following is part of the code I used to test what I told in previuos cases (you'll have to change someting in order to have a clean compilation, I just cut & paste from some source code, sorry). The sample can read and write, in random, ordered or reverse ordered mode, but write only in random order is the clearest test. We tryed using std::fstream but also using directly CreateFile(), WriteFile() and so on the results are the same (even if std::fstream is actually a little slower). Parameters for case 'a' = -f_tempdir_\casea.dat -n10000 -t -p -w Parameters for case 'b' = -f_tempdir_\caseb.dat -n10000 -t -v -w Parameters for case 'b.1' = -f_tempdir_\caseb.dat -n10000 -t -w Parameters for case 'c' = -f_tempdir_\casea.dat -n10000 -w Parameters for case 'd' = -f_tempdir_\caseb.dat -n10000 -w Run the test (and even others) and see... // iotest.cpp : Defines the entry point for the console application. // #include <windows.h> #include <iostream> #include <set> #include <vector> #include "stdafx.h" double RealTime_Microsecs() { LARGE_INTEGER fr = {0, 0}; LARGE_INTEGER ti = {0, 0}; double time = 0.0; QueryPerformanceCounter(&ti); QueryPerformanceFrequency(&fr); time = (double) ti.QuadPart / (double) fr.QuadPart; return time; } int main(int argc, char* argv[]) { std::string sFileName ; size_t stSize, stTimes, stBytes ; int retval = 0 ; char *p = NULL ; char *pPattern = NULL ; char *pReadBuf = NULL ; try { // Default stSize = 1<<30 ; // 1Gb stTimes = 1000 ; stBytes = 50 ; bool bTruncate = false ; bool bPre = false ; bool bPreFast = false ; bool bOrdered = false ; bool bReverse = false ; bool bWriteOnly = false ; // Comsumo i parametri for(int index=1; index < argc; ++index) { if ( '-' != argv[index][0] ) throw ; switch(argv[index][1]) { case 'f': sFileName = argv[index]+2 ; break ; case 's': stSize = xw::str::strtol(argv[index]+2) ; break ; case 'n': stTimes = xw::str::strtol(argv[index]+2) ; break ; case 'b':stBytes = xw::str::strtol(argv[index]+2) ; break ; case 't': bTruncate = true ; break ; case 'p' : bPre = true, bPreFast = false ; break ; case 'v' : bPreFast = true, bPre = false ; break ; case 'o' : bOrdered = true, bReverse = false ; break ; case 'r' : bReverse = true, bOrdered = false ; break ; case 'w' : bWriteOnly = true ; break ; default: throw ; break ; } } if ( sFileName.empty() ) { std::cout << "Usage: -f<File Name> -s<File Size> -n<Number of Reads and Writes> -b<Bytes per Read and Write> -t -p -v -o -r -w" << std::endl ; std::cout << "-t truncates the file, -p pre load the file, -v pre load 'veloce', -o writes in order mode, -r write in reverse order mode, -w Write Only" << std::endl ; std::cout << "Default: 1Gb, 1000 times, 50 bytes" << std::endl ; throw ; } if ( !stSize || !stTimes || !stBytes ) { std::cout << "Invalid Parameters" << std::endl ; return -1 ; } size_t stBestSize = 0x00100000 ; std::fstream fFile ; fFile.open(sFileName.c_str(), std::ios_base::binary|std::ios_base::out|std::ios_base::in|(bTruncate?std::ios_base::trunc:0)) ; p = new char[stBestSize] ; pPattern = new char[stBytes] ; pReadBuf = new char[stBytes] ; memset(p, 0, stBestSize) ; memset(pPattern, (int)(stBytes&0x000000ff), stBytes) ; double dTime = RealTime_Microsecs() ; size_t stCopySize, stSizeToCopy = stSize ; if ( bPre ) { do { stCopySize = std::min(stSizeToCopy, stBestSize) ; fFile.write(p, stCopySize) ; stSizeToCopy -= stCopySize ; } while (stSizeToCopy) ; std::cout << "Creating time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } else if ( bPreFast ) { fFile.seekp(stSize-1) ; fFile.write(p, 1) ; std::cout << "Creating Fast time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } size_t stPos ; ::srand((unsigned int)dTime) ; double dReadTime, dWriteTime ; stCopySize = stTimes ; std::vector<size_t> inVect ; std::vector<size_t> outVect ; std::set<size_t> outSet ; std::set<size_t> inSet ; // Prepare vector and set do { stPos = (size_t)(::rand()<<16) % stSize ; outVect.push_back(stPos) ; outSet.insert(stPos) ; stPos = (size_t)(::rand()<<16) % stSize ; inVect.push_back(stPos) ; inSet.insert(stPos) ; } while (--stCopySize) ; // Write & read using vectors if ( !bReverse && !bOrdered ) { std::vector<size_t>::iterator outI, inI ; outI = outVect.begin() ; inI = inVect.begin() ; stCopySize = stTimes ; dReadTime = 0.0 ; dWriteTime = 0.0 ; do { dTime = RealTime_Microsecs() ; fFile.seekp(*outI) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++outI ; if ( !bWriteOnly ) { dTime = RealTime_Microsecs() ; fFile.seekg(*inI) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++inI ; } } while (--stCopySize) ; std::cout << "Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " (Ave: " << xw::str::itoa(dWriteTime/stTimes, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { std::cout << "Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " (Ave: " << xw::str::itoa(dReadTime/stTimes, 10, 'f') << ")" << std::endl ; } } // End // Write in order if ( bOrdered ) { std::set<size_t>::iterator i = outSet.begin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.begin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End // Write in reverse order if ( bReverse ) { std::set<size_t>::reverse_iterator i = outSet.rbegin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.rbegin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End dTime = RealTime_Microsecs() ; fFile.close() ; std::cout << "Flush/Close Time is " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; std::cout << "Program Terminated" << std::endl ; } catch(...) { std::cout << "Something wrong or wrong parameters" << std::endl ; retval = -1 ; } if ( p ) delete []p ; if ( pPattern ) delete []pPattern ; if ( pReadBuf ) delete []pReadBuf ; return retval ; }

Read the article
SQL server virtual memory usage and perofrmance

- by user365035

Hello, I have a very large DB used mostly for analytics. The performance overall is very sluggish. I just noticed that when running the query below, the amount of virtual memory used greatly exceed the amount of physical memory available. Currently, phsycial memory is 10GB (10238 bytes) where as the virtual memory returns significantly more 8388607 bytes. That seems really wrong, but I'm at a bit of a loss on how to proceed. USE [master]; GO select cpu_count , hyperthread_ratio , physical_memory_in_bytes / 1048576 as 'mem_MB' , virtual_memory_in_bytes / 1048576 as 'virtual_mem_MB' , max_workers_count , os_error_mode , os_priority_class from sys.dm_os_sys_info

Read the article
How well does Scala Perform Comapred to Java?

- by Teja Kantamneni

The Question actually says it all. The reason behind this question is I am about to start a small side project and want to do it in Scala. I am learning scala for the past one month and now I am comfortable working with it. The scala compiler itself is pretty slow (unless you use fsc). So how well does it perform on JVM? I previously worked on groovy and I had seen sometimes over performed than java. My Question is how well scala perform on JVM compared to Java. I know scala has some very good features(FP, dynamic lang, statically typed...) but end of the day we need the performance...

Read the article

< Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38 | Next Page >