Search Results

Search found 19055 results on 763 pages for 'high performance'.

Page 20/763 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >

  • MySQL performance over a (local) network much slower than I would expect

    - by user15241
    MySQL queries in my production environment are taking much longer than I would expect them too. The site in question is a fairly large Drupal site, with many modules installed. The webserver (Nginx) and database server (mysql) are hosted on separated machines, connected by a 100mbps LAN connection (hosted by Rackspace). I have the exact same site running on my laptop for development. Obviously, on my laptop, the webserver and database server are on the same box. Here are the results of my database query times: Production: Executed 291 queries in 320.33 milliseconds. (homepage) Executed 517 queries in 999.81 milliseconds. (content page) Development: Executed 316 queries in 46.28 milliseconds. (homepage) Executed 586 queries in 79.09 milliseconds. (content page) As can clearly be seen from these results, the time involved with querying the MySQL database is much shorter on my laptop, where the MySQL server is running on the same database as the web server. Why is this?! One factor must be the network latency. On average, a round trip from from the webserver to the database server takes 0.16ms (shown by ping). That must be added to every singe MySQL query. So, taking the content page example above, where there are 517 queries executed. Network latency alone will add 82ms to the total query time. However, that doesn't account for the difference I am seeing (79ms on my laptop vs 999ms on the production boxes). What other factors should I be looking at? I had thought about upgrading the NIC to a gigabit connection, but clearly there is something else involved. I have run the MySQL performance tuning script from http://www.day32.com/MySQL/ and it tells me that my database server is configured well (better than my laptop apparently). The only problem reported is "Of 4394 temp tables, 48% were created on disk". This is true in both environments and in the production environment I have even tried increasing max_heap_table_size and Current tmp_table_size to 1GB, with no change (I think this is because I have some BLOB and TEXT columns).

    Read the article

  • Tactics for using PHP in a high-load site

    - by Ross
    Before you answer this I have never developed anything popular enough to attain high server loads. Treat me as (sigh) an alien that has just landed on the planet, albeit one that knows PHP and a few optimisation techniques. I'm developing a tool in PHP that could attain quite a lot of users, if it works out right. However while I'm fully capable of developing the program I'm pretty much clueless when it comes to making something that can deal with huge traffic. So here's a few questions on it (feel free to turn this question into a resource thread as well). Databases At the moment I plan to use the MySQLi features in PHP5. However how should I setup the databases in relation to users and content? Do I actually need multiple databases? At the moment everything's jumbled into one database - although I've been considering spreading user data to one, actual content to another and finally core site content (template masters etc.) to another. My reasoning behind this is that sending queries to different databases will ease up the load on them as one database = 3 load sources. Also would this still be effective if they were all on the same server? Caching I have a template system that is used to build the pages and swap out variables. Master templates are stored in the database and each time a template is called it's cached copy (a html document) is called. At the moment I have two types of variable in these templates - a static var and a dynamic var. Static vars are usually things like page names, the name of the site - things that don't change often; dynamic vars are things that change on each page load. My question on this: Say I have comments on different articles. Which is a better solution: store the simple comment template and render comments (from a DB call) each time the page is loaded or store a cached copy of the comments page as a html page - each time a comment is added/edited/deleted the page is recached. Finally Does anyone have any tips/pointers for running a high load site on PHP. I'm pretty sure it's a workable language to use - Facebook and Yahoo! give it great precedence - but are there any experiences I should watch out for? Thanks, Ross

    Read the article

  • Matlab - binary vector with high concentration of 1s (or 0s)

    - by JohnIdol
    What's the best way to generate a number X of random binary vectors of size N with concentration of 1s (or, simmetrically, of 0s) that spans from very low to very high? Using randint or unidrnd (as in this question) will generate binary vectors with uniform distributions, which is not what I need in this case. Any help appreciated!

    Read the article

  • MySQL high IO usage quries

    - by jack
    MySQL has a built-in slow query logger. Is there any options or third-party tools which are able to detect the queries causing high IO usage just in the way like what slow query logger does?

    Read the article

  • Programming Language Choices for High Integrity Systems

    - by Finbarr
    What programming languages are a good choice for High Integrity Systems? An example of a bad choice is Java as there is a considerable amount of code that is inaccessible to the programmer. I am looking for examples of strongly typed, block structured languages where the programmer is responsible for 100% of the code, and there is as little interference from things like a JVM as possible. Compilers will obviously be an issue. Language must have a complete and unambiguous definition.

    Read the article

  • What is recommended minimum object size for gzip performance benefits?

    - by utt73
    I'm working on improving page speed display times, and one of the methods is to gzip content from the webserver. Google recommends: Note that gzipping is only beneficial for larger resources. Due to the overhead and latency of compression and decompression, you should only gzip files above a certain size threshold; we recommend a minimum range between 150 and 1000 bytes. Gzipping files below 150 bytes can actually make them larger. We serve our content through Akamai, using their network for a proxy and CDN. What they've told me: Following up on your question regarding what is the minimum size Akamai will compress the requested object when sending it to the end user: The minimum size is 860 bytes. My reply: What is the reason(s) for why Akamai's minimum size is 860 bytes? And why, for example, is this not the case for files Akamai serves for facebook? (see below) Google recommends to gzip more agressively. And that seems appropriate on our site where the most frequent hits, by far, are AJAX calls that are <860 bytes. Akamai's response: The reasons 860 bytes is the minimum size for compression is twofold: (1) The overhead of compressing an object under 860 bytes outweighs performance gain. (2) Objects under 860 bytes can be transmitted via a single packet anyway, so there isn't a compelling reason to compress them. So I'm here for some fact checking. Is the 860 byte limit due to packet size the end of this reasoning? Why would high traffic sites push this down to the 150 byte limit... just to save on bandwidth costs (since CDNs base their charges on bandwith offloaded from origin), or is there a performance gain in doing so?

    Read the article

  • How do I measure performance of a virtual server?

    - by Sergey
    I've got a VPS running Ubuntu. Being a virtual server, I understand that it shares resources with unknown number of other servers, and I'm noticing that it's considerably slower than my desktop machine. Is there some tool to measure the performance of the virtual machine? I'd be curious to see some approximate measure similar to bogomips, possibly for CPU (operations/sec), memory and disk read/write speed. I'd like to be able to compare those numbers to my desktop machine. I'm not interested in the specs of the actual physical machine my VPS is running on - by doing cat /proc/cpuinfo I can see that it's a nice quad-core Xeon machine, but it doesn't matter to me. I'm basically interested in how fast a program would run in my VPS - how many CPU operations it can make in a second, how many bytes to write to RAM or to disk. I only have ssh access to the machine so the tool need to be command-line. I could write a script which, say, does some calculations in a loop for a second and counts how many loops it was able to do, or something similar to measure disk and RAM performance. But I'm sure something like this already exists.

    Read the article

  • Performing client-side OAuth authorized Twitter API calls versus server side, how much of a difference is there in terms of performance?

    - by Terence Ponce
    I'm working on a Twitter application in Ruby on Rails. One of the biggest arguments that I have with other people on the project is the method of calling the Twitter API. Before, everything was done on the server: OAuth login, updating the user's Twitter data, and retrieving tweets. Retrieving tweets was the heaviest thing to do since we don't store the tweets in our database, so viewing the tweets means that we have to call the API every time. One of the people in the project suggested that we call the tweets through Javascript instead to lessen the load on the server. We used GET search, which, correct me if I'm wrong, will be removed when v1.0 becomes completely deprecated, but that really isn't a concern now. When the Twitter API has migrated completely to v1.1 (again, correct me if I'm wrong), every calls to the API must be authenticated, so we have to authenticate our Javascript requests to the API. As said here: We don't support or recommend performing OAuth directly through Javascript -- it's insecure and puts your application at risk. The only acceptable way to perform it is if you kept all keys and secrets server-side, computed the OAuth signatures and parameters server side, then issued the request client-side from the server-generated OAuth values. If we do exactly what Twitter suggests, the only difference between this and doing everything server-side is that our server won't have to contact the Twitter API anymore every time the user wants to view tweets. Here's how I would picture what's happening every time the user makes a request: If we do it through Javascript, it would be harder on my part because I would have to create the signatures manually for every request, but I will gladly do it if the boost in performance is worth all the trouble. Doing it through Ruby on Rails would be very easy since the Twitter gem does most of the grunt work already, so I'm really encouraging the other people in the project to agree with me. Is the difference in performance trivial or is it significant enough to switch to Javascript?

    Read the article

  • How to squeeze the maximum performance out of Unity and GNOME 3?

    - by melvincv
    I see that I do not get good performance with the new Unity desktop, but I should say that Unity has improved a lot since the last edition Ubuntu 11.10. How to squeeze the maximum performance out of 1. Unity 2. GNOME 3 My system specs: -Processors- Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz -Memory- Total Memory : 2049996 kB -PCI Devices- Host bridge : Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 10) PCI bridge : Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 10) (prog-if 00 [Normal decode]) VGA compatible controller : Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 10) (prog-if 00 [VGA controller]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI]) PCI bridge : Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode]) ISA bridge : Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) IDE interface : Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP]) IDE interface : Intel Corporation N10/ICH7 Family SATA Controller [IDE mode] (rev 01) (prog-if 8f [Master SecP SecO PriP PriO]) SMBus : Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01) Ethernet controller : Intel Corporation PRO/100 VE Network Connection (rev 01)

    Read the article

  • Premature-Optimization and Performance Anxiety

    - by James Michael Hare
    While writing my post analyzing the new .NET 4 ConcurrentDictionary class (here), I fell into one of the classic blunders that I myself always love to warn about.  After analyzing the differences of time between a Dictionary with locking versus the new ConcurrentDictionary class, I noted that the ConcurrentDictionary was faster with read-heavy multi-threaded operations.  Then, I made the classic blunder of thinking that because the original Dictionary with locking was faster for those write-heavy uses, it was the best choice for those types of tasks.  In short, I fell into the premature-optimization anti-pattern. Basically, the premature-optimization anti-pattern is when a developer is coding very early for a perceived (whether rightly-or-wrongly) performance gain and sacrificing good design and maintainability in the process.  At best, the performance gains are usually negligible and at worst, can either negatively impact performance, or can degrade maintainability so much that time to market suffers or the code becomes very fragile due to the complexity. Keep in mind the distinction above.  I'm not talking about valid performance decisions.  There are decisions one should make when designing and writing an application that are valid performance decisions.  Examples of this are knowing the best data structures for a given situation (Dictionary versus List, for example) and choosing performance algorithms (linear search vs. binary search).  But these in my mind are macro optimizations.  The error is not in deciding to use a better data structure or algorithm, the anti-pattern as stated above is when you attempt to over-optimize early on in such a way that it sacrifices maintainability. In my case, I was actually considering trading the safety and maintainability gains of the ConcurrentDictionary (no locking required) for a slight performance gain by using the Dictionary with locking.  This would have been a mistake as I would be trading maintainability (ConcurrentDictionary requires no locking which helps readability) and safety (ConcurrentDictionary is safe for iteration even while being modified and you don't risk the developer locking incorrectly) -- and I fell for it even when I knew to watch out for it.  I think in my case, and it may be true for others as well, a large part of it was due to the time I was trained as a developer.  I began college in in the 90s when C and C++ was king and hardware speed and memory were still relatively priceless commodities and not to be squandered.  In those days, using a long instead of a short could waste precious resources, and as such, we were taught to try to minimize space and favor performance.  This is why in many cases such early code-bases were very hard to maintain.  I don't know how many times I heard back then to avoid too many function calls because of the overhead -- and in fact just last year I heard a new hire in the company where I work declare that she didn't want to refactor a long method because of function call overhead.  Now back then, that may have been a valid concern, but with today's modern hardware even if you're calling a trivial method in an extremely tight loop (which chances are the JIT compiler would optimize anyway) the results of removing method calls to speed up performance are negligible for the great majority of applications.  Now, obviously, there are those coding applications where speed is absolutely king (for example drivers, computer games, operating systems) where such sacrifices may be made.  But I would strongly advice against such optimization because of it's cost.  Many folks that are performing an optimization think it's always a win-win.  That they're simply adding speed to the application, what could possibly be wrong with that?  What they don't realize is the cost of their choice.  For every piece of straight-forward code that you obfuscate with performance enhancements, you risk the introduction of bugs in the long term technical debt of the application.  It will become so fragile over time that maintenance will become a nightmare.  I've seen such applications in places I have worked.  There are times I've seen applications where the designer was so obsessed with performance that they even designed their own memory management system for their application to try to squeeze out every ounce of performance.  Unfortunately, the application stability often suffers as a result and it is very difficult for anyone other than the original designer to maintain. I've even seen this recently where I heard a C++ developer bemoaning that in VS2010 the iterators are about twice as slow as they used to be because Microsoft added range checking (probably as part of the 0x standard implementation).  To me this was almost a joke.  Twice as slow sounds bad, but it almost never as bad as you think -- especially if you're gaining safety.  The only time twice is really that much slower is when once was too slow to begin with.  Think about it.  2 minutes is slow as a response time because 1 minute is slow.  But if an iterator takes 1 microsecond to move one position and a new, safer iterator takes 2 microseconds, this is trivial!  The only way you'd ever really notice this would be in iterating a collection just for the sake of iterating (i.e. no other operations).  To my mind, the added safety makes the extra time worth it. Always favor safety and maintainability when you can.  I know it can be a hard habit to break, especially if you started out your career early or in a language such as C where they are very performance conscious.  But in reality, these type of micro-optimizations only end up hurting you in the long run. Remember the two laws of optimization.  I'm not sure where I first heard these, but they are so true: For beginners: Do not optimize. For experts: Do not optimize yet. This is so true.  If you're a beginner, resist the urge to optimize at all costs.  And if you are an expert, delay that decision.  As long as you have chosen the right data structures and algorithms for your task, your performance will probably be more than sufficient.  Chances are it will be network, database, or disk hits that will be your slow-down, not your code.  As they say, 98% of your code's bottleneck is in 2% of your code so premature-optimization may add maintenance and safety debt that won't have any measurable impact.  Instead, code for maintainability and safety, and then, and only then, when you find a true bottleneck, then you should go back and optimize further.

    Read the article

  • SQL Server 2005 high memory usage and performance problems

    - by emzero
    Hi there guys. I have this ASP.NET/SQLServer2005 website running on a production server (Win2003, QuadCore, 4GB). The site runs smoothly normally, but after 2-3 weeks I notice a slow performance on the site (especifically in one particular page). Also I notice that the SQL Server process is using like 2GBs of RAM. So I restart the service, the site runs fast again and the process 300-400MBs. I'm looking for an explanation of why is this happening? What is SQL Server storing in RAM that takes too much space and degrades the performance? What can I do to avoid this? I'm trying to avoid restarting the SQLServer everytime this happens. Thank you!

    Read the article

  • Higher than high-level web frameworks or CMS's?

    - by Ben
    I'm looking for options that allow very high-level web site development with these special characteristics: not requiring the user to program in a complex programming language not requiring the user to use GUI-like administration areas allow the user to "program" in a lightweight markup language The last point is not only about look and structure of the output but also about creating simple dynamic output. For example: listing pages fetching the content of other pages doing a site search and displaying its output dynamically show or hide parts of the page depending on login status Of course, I am not expecting a solution that provides the same possibilities like a Ruby, Python or PHP web framework. Rather I am looking for support of the "basics" that are common for web sites. Until now, I have found only one piece of software that fulfills these requirements but while it's free, it's not open source: BoltWire at http://www.boltwire.com/. Being open source is required.

    Read the article

  • How to avoid Memory "Hard Fault/sec"

    - by Flavio Oliveira
    i've a problem on my windows 2008 server x64, and i cannot understand how can i solve it. i'm looking to Resource Monitor and see about 100 to 200 hard faults/sec. and generally the machine is slow. As i've readed a bit it is caused by a "memory Page" that is no longer available on physical memory and causes a io operations (disk) and it is a problem. The current hardware is a intel core2duo E8400 (3.0GHz) with 6GB RAM on a Windows Server Web 64-bit. Actually the machine have about 2GB Ram used what having 4Gb available to use, Why is the machine requires that high level of Disk operations? what can i do to increase the performance? Im experiencing a memory issues? what should be my starting point?

    Read the article

  • process taking high time on Itanium server

    - by Vishwa
    Hi, we have an application which is recently migrated from PA-Risk server to itanium server. After migration we noticed that there is significant increase in time taken to complete a process. when we tracked the time taken by each part we found that the system time is 7.590000, user time is 3.990000 but elapsed time is 70.434882!! Due to this huge elapsed time, overall performance of the application has gone very low. i wrote a small progrms to perform I/O operations on Itanium. these scripts working faster on Itanium compared to PA-Risk. what could be the reason for this high elapsed time of this process?

    Read the article

  • SQL server peformance, virtual memory usage

    - by user45641
    Hello, I have a very large DB used mostly for analytics. The performance overall is very sluggish. I just noticed that when running the query below, the amount of virtual memory used greatly exceeds the amount of physical memory available. Currently, physical memory is 10GB (10238 MB) whereas the virtual memory returns significantly more - 8388607 MB. That seems really wrong, but I'm at a bit of a loss on how to proceed. USE [master]; GO select cpu_count , hyperthread_ratio , physical_memory_in_bytes / 1048576 as 'mem_MB' , virtual_memory_in_bytes / 1048576 as 'virtual_mem_MB' , max_workers_count , os_error_mode , os_priority_class from sys.dm_os_sys_info

    Read the article

  • android imageview scrolling to display full high quality image

    - by Javadid
    hi friends... i am working on displaying an image and placing an icon on top of it... clicking the icon will show an enlarged version of the image... though putting the imageview holding the image in a LinearLayout scales the image to the width of the dialog, the problem is that i need to display the image in a dialog but the image is very high resolution and hence is far bigger than the width of the dialog... I need to show the actual image with scrolling for both ways to see the whole image... But whenever i try putting the imageView in a scrollview the top of my imageview is blank... and again though image scrolls downwards the width is scaled to the width of the dialog... Helppppppppp guys....

    Read the article

  • Benchmarking Java programs

    - by stefan-ock
    For university, I perform bytecode modifications and analyze their influence on performance of Java programs. Therefore, I need Java programs---in best case used in production---and appropriate benchmarks. For instance, I already got HyperSQL and measure its performance by the benchmark program PolePosition. The Java programs running on a JVM without JIT compiler. Thanks for your help! P.S.: I cannot use programs to benchmark the performance of the JVM or of the Java language itself (such as Wide Finder).

    Read the article

  • High CPU - What to do.

    - by Udi Kantzuker
    I have a high CPU problem with MYSQL using "top" ( linux ) shows cpu peaks of 90%. I was trying to find the source of the problem, turned on general log and slow query log, The slow query log did not find anything. The Db contains a few small tables and one large table that contains almost 100k rows, Database Engine is MyIsam. strange thing i have noticed that on the large table, select, insert are very fast but update takes 0.2 - 0.5 secs. already used optimize and repair and no improvement. the table is being updated frequently, could this be the source of the high CPU% ? What can i do to improve this?

    Read the article

  • Drupal CMS most Stable for High Traffic

    - by Aditi
    Drupal users have high satisfaction with Drupal compared to the Joomla users, for a number of reasons. If you are thinking of  choosing a high performance platform to run your high traffic website.. Drupal Installation is your forte! Overload Scenario Drupal is scalable high performance CMS and is stable under heavy load. If your server is pushed beyond its capacity, Drupal shuts off gracefully and doesn’t crash. As soon as the server is back within its traffic capability, Drupal handles all requests smoothly again. For example if your dedicated server can handle a maximum of 50,000 visits a day, and on lucky days when your news created the buzz in social media and your traffic rose to 70,000 on one day, then your server will be overloaded and usually it crashes causing permanent damage to your database at times.. But if you have used Drupal CMS it closes down gracefully an as soon as traffic goes down to within the server’s capacity, the Drupal running site accepts all requests again. Extensibility Drupal users know that their add-ons integrate better with the core, and their framework makes it easier to extend their CMS’s capabilities.. which makes an extended version of it quite stable unlike Joomla, which loses its strength if you have plenty of plugins & heavy customizations running. Any CMS with number of plugins makes the content complex and reduces your ability to handle high traffic requests. Accessibility Management or ACL Chances are if you are high traffic website, you may have various users & content contributors. ACL means group roles that is assigning people out of the various registered user levels and allocating many kinds of privileges. The most common example is the ability to see or edit a section or selected pages. This efficient feature of Drupal makes it a class apart than other CMSs out there.

    Read the article

  • How can Agile methodologies be adapted to High Volume processing system development?

    - by luckyluke
    I am developing high volume processing systems. Like mathematical models that calculate various parameters based on millions of records, calculated derived fields over milions of records, process huge files having transactions etc... I am well aware of unit testing methodologies and if my code is in C# I have no problem in unit testing it. Problem is I often have code in T-SQL, C# code that is a SQL stored assembly, and SSIS workflow with a good amount of logic (and outcomes etc) or some SAS process. What is the approach YOu use when developing such systems. I usually develop several tests as Stored procedures in a designed schema(TEST) and then automatically run them overnight and check out the results. But this is only for T-SQL. And Continous integration IS hard. But the problem is with testing SSIS packages. How do You test it? What is Your preferred approach for stubbing data into tables (especially if You need a lot data initialization). I have some approach derived over the years but maybe I am just not reading enough articles. So Banking, Telecom, Risk developers out there. How do You test your mission critical apps that process milions of records at end day, month end etc? What frameworks do You use? How do You validate that Your ssis package is Correct (as You develop it)/ How do You achieve continous integration in such an environment (Personally I never got there)? I hope this is not to open-ended question. How do You test Your map-reduce jobs for example (i do not use hadoop but this is quite similar). luke Hope that this is not too open ended

    Read the article

  • Can compressing Program Files save space *and* give a significant boost to SSD performance?

    - by Christopher Galpin
    Considering solid-state disk space is still an expensive resource, compressing large folders has appeal. Thanks to VirtualStore, could Program Files be a case where it might even improve performance? Discovery In particular I have been reading: SSD and NTFS Compression Speed Increase? Does NTFS compression slow SSD/flash performance? Will somebody benchmark whole disk compression (HD,SSD) please? (may have to scroll up) The first link is particularly dreamy, but maybe head a little too far in the clouds. The third link has this sexy semi-log graph (logarithmic scale!). Quote (with notes): Using highly compressable data (IOmeter), you get at most a 30x performance increase [for reads], and at least a 49x performance DECREASE [for writes]. Assuming I interpreted and clarified that sentence correctly, this single user's benchmark has me incredibly interested. Although write performance tanks wretchedly, read performance still soars. It gave me an idea. Idea: VirtualStore It so happens that thanks to sanity saving security features introduced in Windows Vista, write access to certain folders such as Program Files is virtualized for non-administrator processes. Which means, in normal (non-elevated) usage, a program or game's attempt to write data to its install location in Program Files (which is perhaps a poor location) is redirected to %UserProfile%\AppData\Local\VirtualStore, somewhere entirely different. Thus, to my understanding, writes to Program Files should primarily only occur when installing an application. This makes compressing it not only a huge source of space gain, but also a potential candidate for performance gain. Testing The beginning of this post has me a bit timid, it suggests benchmarking NTFS compression on a whole drive is difficult because turning it off "doesn't decompress the objects". However it seems to me the compact command is perfectly capable of doing so for both drives and individual folders. Could it be only marking them for decompression the next time the OS reads from them? I need to find the answer before I begin my own testing.

    Read the article

  • Performance question: Inverting an array of pointers in-place vs array of values

    - by Anders
    The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence. The way I'm doing it now, using a matrix library (MTL4): For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option. Using an array of pointers instead: For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place? The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations. So the performance question boils down to this, as I see it: - The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive. - The array of pointers does not need the extra memory for matrix A containing values. - The array of pointers option does not have to copy all NxN values of A between each iteration. - The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc. Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

    Read the article

  • Performance impact: What is the optimal payload for SqlBulkCopy.WriteToServer()?

    - by Linchi Shea
    For many years, I have been using a C# program to generate the TPC-C compliant data for testing. The program relies on the SqlBulkCopy class to load the data generated by the program into the SQL Server tables. In general, the performance of this C# data loader is satisfactory. Lately however, I found myself in a situation where I needed to generate a much larger amount of data than I typically do and the data needed to be loaded within a confined time frame. So I was driven to look into the code...(read more)

    Read the article

  • Performance triage

    - by Dave
    Folks often ask me how to approach a suspected performance issue. My personal strategy is informed by the fact that I work on concurrency issues. (When you have a hammer everything looks like a nail, but I'll try to keep this general). A good starting point is to ask yourself if the observed performance matches your expectations. Expectations might be derived from known system performance limits, prototypes, and other software or environments that are comparable to your particular system-under-test. Some simple comparisons and microbenchmarks can be useful at this stage. It's also useful to write some very simple programs to validate some of the reported or expected system limits. Can that disk controller really tolerate and sustain 500 reads per second? To reduce the number of confounding factors it's better to try to answer that question with a very simple targeted program. And finally, nothing beats having familiarity with the technologies that underlying your particular layer. On the topic of confounding factors, as our technology stacks become deeper and less transparent, we often find our own technology working against us in some unexpected way to choke performance rather than simply running into some fundamental system limit. A good example is the warm-up time needed by just-in-time compilers in Java Virtual Machines. I won't delve too far into that particular hole except to say that it's rare to find good benchmarks and methodology for java code. Another example is power management on x86. Power management is great, but it can take a while for the CPUs to throttle up from low(er) frequencies to full throttle. And while I love "turbo" mode, it makes benchmarking applications with multiple threads a chore as you have to remember to turn it off and then back on otherwise short single-threaded runs may look abnormally fast compared to runs with higher thread counts. In general for performance characterization I disable turbo mode and fix the power governor at "performance" state. Another source of complexity is the scheduler, which I've discussed in prior blog entries. Lets say I have a running application and I want to better understand its behavior and performance. We'll presume it's warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I'll do is take a set of "pstack" samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what's actually going on inside the program. You'll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you'll get a good sense where the cycles are being spent. (I should caution that normal C/C++ inlining can diffuse an otherwise "hot" method into other methods. This is a rare instance where pstack sampling might not immediately point to the key problem). At this point you'll need to reconcile what you're seeing with pstack and your mental model of what you think the program should be doing. They're often rather different. And generally if there's a key performance issue, you'll spot it with a moderate number of samples. I'll also use OS-level observability tools to lock for the existence of bottlenecks where threads contend for locks; other situations where threads are blocked; and the distribution of threads over the system. On Solaris some good tools are mpstat and too a lesser degree, vmstat. Try running "mpstat -a 5" in one window while the application program runs concurrently. One key measure is the voluntary context switch rate "vctx" or "csw" which reflects threads descheduling themselves. It's also good to look at the user; system; and idle CPU percentages. This can give a broad but useful understanding if your threads are mostly parked or mostly running. For instance if your program makes heavy use of malloc/free, then it might be the case you're contending on the central malloc lock in the default allocator. In that case you'd see malloc calling lock in the stack traces, observe a high csw/vctx rate as threads block for the malloc lock, and your "usr" time would be less than expected. Solaris dtrace is a wonderful and invaluable performance tool as well, but in a sense you have to frame and articulate a meaningful and specific question to get a useful answer, so I tend not to use it for first-order screening of problems. It's also most effective for OS and software-level performance issues as opposed to HW-level issues. For that reason I recommend mpstat & pstack as my the 1st step in performance triage. If some other OS-level issue is evident then it's good to switch to dtrace to drill more deeply into the problem. Only after I've ruled out OS-level issues do I switch to using hardware performance counters to look for architectural impediments.

    Read the article

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >