Search Results

Search found 13341 results on 534 pages for 'obiee performance tuning'.

Page 22/534 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

Premature-Optimization and Performance Anxiety

- by James Michael Hare

While writing my post analyzing the new .NET 4 ConcurrentDictionary class (here), I fell into one of the classic blunders that I myself always love to warn about. After analyzing the differences of time between a Dictionary with locking versus the new ConcurrentDictionary class, I noted that the ConcurrentDictionary was faster with read-heavy multi-threaded operations. Then, I made the classic blunder of thinking that because the original Dictionary with locking was faster for those write-heavy uses, it was the best choice for those types of tasks. In short, I fell into the premature-optimization anti-pattern. Basically, the premature-optimization anti-pattern is when a developer is coding very early for a perceived (whether rightly-or-wrongly) performance gain and sacrificing good design and maintainability in the process. At best, the performance gains are usually negligible and at worst, can either negatively impact performance, or can degrade maintainability so much that time to market suffers or the code becomes very fragile due to the complexity. Keep in mind the distinction above. I'm not talking about valid performance decisions. There are decisions one should make when designing and writing an application that are valid performance decisions. Examples of this are knowing the best data structures for a given situation (Dictionary versus List, for example) and choosing performance algorithms (linear search vs. binary search). But these in my mind are macro optimizations. The error is not in deciding to use a better data structure or algorithm, the anti-pattern as stated above is when you attempt to over-optimize early on in such a way that it sacrifices maintainability. In my case, I was actually considering trading the safety and maintainability gains of the ConcurrentDictionary (no locking required) for a slight performance gain by using the Dictionary with locking. This would have been a mistake as I would be trading maintainability (ConcurrentDictionary requires no locking which helps readability) and safety (ConcurrentDictionary is safe for iteration even while being modified and you don't risk the developer locking incorrectly) -- and I fell for it even when I knew to watch out for it. I think in my case, and it may be true for others as well, a large part of it was due to the time I was trained as a developer. I began college in in the 90s when C and C++ was king and hardware speed and memory were still relatively priceless commodities and not to be squandered. In those days, using a long instead of a short could waste precious resources, and as such, we were taught to try to minimize space and favor performance. This is why in many cases such early code-bases were very hard to maintain. I don't know how many times I heard back then to avoid too many function calls because of the overhead -- and in fact just last year I heard a new hire in the company where I work declare that she didn't want to refactor a long method because of function call overhead. Now back then, that may have been a valid concern, but with today's modern hardware even if you're calling a trivial method in an extremely tight loop (which chances are the JIT compiler would optimize anyway) the results of removing method calls to speed up performance are negligible for the great majority of applications. Now, obviously, there are those coding applications where speed is absolutely king (for example drivers, computer games, operating systems) where such sacrifices may be made. But I would strongly advice against such optimization because of it's cost. Many folks that are performing an optimization think it's always a win-win. That they're simply adding speed to the application, what could possibly be wrong with that? What they don't realize is the cost of their choice. For every piece of straight-forward code that you obfuscate with performance enhancements, you risk the introduction of bugs in the long term technical debt of the application. It will become so fragile over time that maintenance will become a nightmare. I've seen such applications in places I have worked. There are times I've seen applications where the designer was so obsessed with performance that they even designed their own memory management system for their application to try to squeeze out every ounce of performance. Unfortunately, the application stability often suffers as a result and it is very difficult for anyone other than the original designer to maintain. I've even seen this recently where I heard a C++ developer bemoaning that in VS2010 the iterators are about twice as slow as they used to be because Microsoft added range checking (probably as part of the 0x standard implementation). To me this was almost a joke. Twice as slow sounds bad, but it almost never as bad as you think -- especially if you're gaining safety. The only time twice is really that much slower is when once was too slow to begin with. Think about it. 2 minutes is slow as a response time because 1 minute is slow. But if an iterator takes 1 microsecond to move one position and a new, safer iterator takes 2 microseconds, this is trivial! The only way you'd ever really notice this would be in iterating a collection just for the sake of iterating (i.e. no other operations). To my mind, the added safety makes the extra time worth it. Always favor safety and maintainability when you can. I know it can be a hard habit to break, especially if you started out your career early or in a language such as C where they are very performance conscious. But in reality, these type of micro-optimizations only end up hurting you in the long run. Remember the two laws of optimization. I'm not sure where I first heard these, but they are so true: For beginners: Do not optimize. For experts: Do not optimize yet. This is so true. If you're a beginner, resist the urge to optimize at all costs. And if you are an expert, delay that decision. As long as you have chosen the right data structures and algorithms for your task, your performance will probably be more than sufficient. Chances are it will be network, database, or disk hits that will be your slow-down, not your code. As they say, 98% of your code's bottleneck is in 2% of your code so premature-optimization may add maintenance and safety debt that won't have any measurable impact. Instead, code for maintainability and safety, and then, and only then, when you find a true bottleneck, then you should go back and optimize further.

Read the article
How do I objectively measure an application's load on a server

- by Joe

All, I'm not even sure where to begin looking for resources to answer my question, and I realize that speculation about this kind of thing is highly subjective. I need help determining what class of server I should purchase to host a MS Silverlight application with a MSSQL server back-end on a Windows Server 2008 platform. It's an interactive program, so I can't simply generate a list of URLs to test against, and run it with 1000 simultaneous users. What tools are out there to help me determine what kind of load the application will put on a server at varying levels of concurrent users? Would you all suggest separating the SQL server form the web server, to better differentiate the generated load on the different parts of the stack?

Read the article
Server configurations for hosting MySQL database

- by shyam

I have a web application which uses a MySQL database hosted on a virtual server. I've been using this server when I started the application and when the database was really small. Now it has grown and the server is not able to handle the db, causing frequent db errors. I'm planning to get a server and I need suggestions for that. Like I said, the db is now 9 GB, and is growing considerably fast. There are a number of tables with millions of rows, which are frequently updated and queried. The most frequent error the db shows is Lock wait timeout exceeded. Previously there used to be "The total number of locks exceeds the lock table size" errors too, but I could avoid it by increasing Innodb buffer pool size. Please suggest what configurations should I look for in the server I should buy. I read somewhere that the db should ideally have a buffer pool size greater than the size of its data, so in my case I guess I'd need memory gt 9 GB. What other things should I look for in the server? Just tell me if I should give you more info about the

Read the article
Find out which task is generating a lot of context switches on linux

- by Gaks

According to vmstat, my Linux server (2xCore2 Duo 2.5 GHz) is constantly doing around 20k context switches per second. # vmstat 3 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 7292 249472 82340 2291972 0 0 0 0 0 0 7 13 79 0 0 0 7292 251808 82344 2291968 0 0 0 184 24 20090 1 1 99 0 0 0 7292 251876 82344 2291968 0 0 0 83 17 20157 1 0 99 0 0 0 7292 251876 82344 2291968 0 0 0 73 12 20116 1 0 99 0 ... but uptime shows small load: load average: 0.01, 0.02, 0.01 and top doesn't show any process with high %CPU usage. How do I find out what exactly is generating those context switches? Which process/thread? I tried to analyze pidstat output: # pidstat -w 10 1 12:39:13 PID cswch/s nvcswch/s Command 12:39:23 1 0.20 0.00 init 12:39:23 4 0.20 0.00 ksoftirqd/0 12:39:23 7 1.60 0.00 events/0 12:39:23 8 1.50 0.00 events/1 12:39:23 89 0.50 0.00 kblockd/0 12:39:23 90 0.30 0.00 kblockd/1 12:39:23 995 0.40 0.00 kirqd 12:39:23 997 0.60 0.00 kjournald 12:39:23 1146 0.20 0.00 svscan 12:39:23 2162 5.00 0.00 kjournald 12:39:23 2526 0.20 2.00 postgres 12:39:23 2530 1.00 0.30 postgres 12:39:23 2534 5.00 3.20 postgres 12:39:23 2536 1.40 1.70 postgres 12:39:23 12061 10.59 0.90 postgres 12:39:23 14442 1.50 2.20 postgres 12:39:23 15416 0.20 0.00 monitor 12:39:23 17289 0.10 0.00 syslogd 12:39:23 21776 0.40 0.30 postgres 12:39:23 23638 0.10 0.00 screen 12:39:23 25153 1.00 0.00 sshd 12:39:23 25185 86.61 0.00 daemon1 12:39:23 25190 12.19 35.86 postgres 12:39:23 25295 2.00 0.00 screen 12:39:23 25743 9.99 0.00 daemon2 12:39:23 25747 1.10 3.00 postgres 12:39:23 26968 5.09 0.80 postgres 12:39:23 26969 5.00 0.00 postgres 12:39:23 26970 1.10 0.20 postgres 12:39:23 26971 17.98 1.80 postgres 12:39:23 27607 0.90 0.40 postgres 12:39:23 29338 4.30 0.00 screen 12:39:23 31247 4.10 23.58 postgres 12:39:23 31249 82.92 34.77 postgres 12:39:23 31484 0.20 0.00 pdflush 12:39:23 32097 0.10 0.00 pidstat Looks like some postgresql tasks are doing 10 context swiches per second, but it doesn't all sum up to 20k anyway. Any idea how to dig a little deeper for an answer?

Read the article
Changing memory allocator to Jemalloc Centos 6

- by Brian Lovett

After reading this blog post about the impact of memory allocators like jemalloc on highly threaded applications, I wanted to test things on a larger scale on some of our cluster of servers. We run sphinx, and apache using threads, and on 24 core machines. Installing jemalloc was simple enough. We are running Centos 6, so yum install jemalloc jemalloc-devel did the trick. My question is, how do we change everything on the system over to using jemalloc instead of the default malloc built into Centos. Research pointed me at this as a potential option: LD_PRELOAD=$LD_PRELOAD:/usr/lib64/libjemalloc.so.1 Would this be sufficient to get everything using jemalloc?

Read the article
How to best tune a Linux development machine?

- by cletus

How to best tune a Linux PC for development purposes?

Read the article
Why MySQL sat for 2 minutes doing nothing?

- by Alex R

This was a one-time thing, not reproducible... But I saved the show innodb status output. Can anybody tell what's going on here? The simple insert took almost 3 minutes to complete. | InnoDB | | ===================================== 110201 15:58:10 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 34 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 11963, signal count 11766 --Thread 1824 has waited at .\btr\btr0cur.c line 443 for 118.00 seconds the sema phore: S-lock on RW-latch at 09D6453C created in file .\buf\buf0buf.c line 550 a writer (thread id 1824) has reserved it in mode wait exclusive number of readers 1, waiters flag 1 Last time read locked in file .\buf\buf0flu.c line 599 Last time write locked in file .\btr\btr0cur.c line 443 Mutex spin waits 0, rounds 527817, OS waits 7133 RW-shared spins 2532, OS waits 1226; RW-excl spins 1652, OS waits 1118 ------------ TRANSACTIONS ------------ Trx id counter 0 95830 Purge done for trx's n:o < 0 95814 undo n:o < 0 0 History list length 11 LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 0, not started, OS thread id 3704 MySQL thread id 551, query id 2702112 localhost 127.0.0.1 root show innodb status ---TRANSACTION 0 95829, not started, OS thread id 3132 MySQL thread id 534, query id 2702020 localhost 127.0.0.1 root ---TRANSACTION 0 95828, not started, OS thread id 3152 MySQL thread id 527, query id 2701973 localhost 127.0.0.1 root ---TRANSACTION 0 95827, ACTIVE 118 sec, OS thread id 1824 inserting, thread decl ared inside InnoDB 500 mysql tables in use 1, locked 1 1 lock struct(s), heap size 320, 0 row lock(s) MySQL thread id 526, query id 2701972 localhost 127.0.0.1 root update INSERT INTO log_searchcriteria (userid,search_criteria,date,search_type) VALUES ( NAME_CONST('userid',NULL), NAME_CONST('search_criteria',_latin1' SELECT SQL_C ALC_FOUND_ROWS idx_search.CTCX_LATITUDE, idx_search.CTCX_LONGITUDE, idx_search.b uilding_id, idx_search.LN_LIST_NUMBER, idx_search.LP_LIST_PRICE, idx_search.HSN_ ADRESS_HOUSE_NUMBER, idx_search.STR_ADDRESS_STREET, idx_search.CP_ADDRESS_COMPAS S_POINT, idx_search.UN_UNIT, idx_search.CIT_CITY, idx_search.ZP_ZIP_CODE, idx_se arch.AR_AREA_NAME, idx_search.BR_BEDROOMS, idx_search.BTH_BATHS, idx_search.ST_S TATUS, idx_search.CTCX_STYLE_TYPE, idx_s -------- FILE I/O -------- I/O thread 0 state: wait Windows aio (insert buffer thread) I/O thread 1 state: wait Windows aio (log thread) I/O thread 2 state: wait Windows aio (read thread) I/O thread 3 state: wait Windows aio (write thread) Pending normal aio reads: 0, aio writes: 1, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 151006 OS file reads, 120758 OS file writes, 6844 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 5, seg size 7, 24664 inserts, 24664 merged recs, 4612 merges Hash table size 553253, node heap has 629 buffer(s) 0.00 hash searches/s, 0.00 non-hash searches/s --- LOG --- Log sequence number 5 2318193115 Log flushed up to 5 2318193115 Last checkpoint at 5 2318129891 0 pending log writes, 0 pending chkp writes 3036 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 213459462; in additional pool allocated 1720192 Dictionary memory allocated 240416 Buffer pool size 8192 Free buffers 0 Database pages 7563 Modified db pages 18 Pending reads 0 Pending writes: LRU 0, flush list 18, single page 0 Pages read 150973, created 28788, written 115137 0.00 reads/s, 0.00 creates/s, 0.00 writes/s No buffer pool page gets since the last printout -------------- ROW OPERATIONS -------------- 1 queries inside InnoDB, 0 queries in queue 1 read views open inside InnoDB Main thread id 2992, state: flushing buffer pool pages Number of rows inserted 794294, updated 89203, deleted 13698, read 1453084305 0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ Thanks

Read the article
How to check for bottlenecks in Windows Server 2008 R2

- by Phil Koury

I recently switched out a 10 year old server for a brand new server in a small office and upgraded from Windows Server 2000 to Windows Server 2008 R2. After the switch was complete and some configurations were changed around we are running into what appear to be some bottlenecks in the network speed. Accessing programs on the server is slower (resulting in long loading times, slower report generation, etc.) than it was on the old server hardware. I am wondering what options or tools I have, if there are any at all, to find out exactly where these hang ups might be coming from.

Read the article
Benchmarking Java programs

- by stefan-ock

For university, I perform bytecode modifications and analyze their influence on performance of Java programs. Therefore, I need Java programs---in best case used in production---and appropriate benchmarks. For instance, I already got HyperSQL and measure its performance by the benchmark program PolePosition. The Java programs running on a JVM without JIT compiler. Thanks for your help! P.S.: I cannot use programs to benchmark the performance of the JVM or of the Java language itself (such as Wide Finder).

Read the article
How to know if my nginx is in good health?

- by Howard

I am running a nginx on EC2 (m1.small) for SSL termination. I am using 2 workers on Ubuntu, with latest nginx (stable), the network throughput is around 2Mbps and system load average is around 2 to 3. I am wondering if this system is in good health for now, e.g. what is the queue length (I know nginx can handle a lot of concurrent request, but I mean before the request is being served, how many of them need to wait before being served) what is the average queue time for a given request to be served. I want to know because if my nginx is cpu bounded (e.g. due to SSL), I will need to upgrade to a faster instance. My current nginx status Active connections: 4076 server accepts handled requests 90664283 90664283 104117012 Reading: 525 Writing: 81 Waiting: 3470

Read the article
What perfmon counters should I track for improvements to the query plan cache?

- by Bill Paetzke

I am parameterizing my web app's ad hoc sql. As a result, I expect the query plan cache to reduce in size and have a higher hit ratio. What should I track in perfmon regarding this? How could I report on this change? (I'd also accept an answer that queries the DMV tables or whatever else to report on the change).

Read the article
Can compressing Program Files save space *and* give a significant boost to SSD performance?

- by Christopher Galpin

Considering solid-state disk space is still an expensive resource, compressing large folders has appeal. Thanks to VirtualStore, could Program Files be a case where it might even improve performance? Discovery In particular I have been reading: SSD and NTFS Compression Speed Increase? Does NTFS compression slow SSD/flash performance? Will somebody benchmark whole disk compression (HD,SSD) please? (may have to scroll up) The first link is particularly dreamy, but maybe head a little too far in the clouds. The third link has this sexy semi-log graph (logarithmic scale!). Quote (with notes): Using highly compressable data (IOmeter), you get at most a 30x performance increase [for reads], and at least a 49x performance DECREASE [for writes]. Assuming I interpreted and clarified that sentence correctly, this single user's benchmark has me incredibly interested. Although write performance tanks wretchedly, read performance still soars. It gave me an idea. Idea: VirtualStore It so happens that thanks to sanity saving security features introduced in Windows Vista, write access to certain folders such as Program Files is virtualized for non-administrator processes. Which means, in normal (non-elevated) usage, a program or game's attempt to write data to its install location in Program Files (which is perhaps a poor location) is redirected to %UserProfile%\AppData\Local\VirtualStore, somewhere entirely different. Thus, to my understanding, writes to Program Files should primarily only occur when installing an application. This makes compressing it not only a huge source of space gain, but also a potential candidate for performance gain. Testing The beginning of this post has me a bit timid, it suggests benchmarking NTFS compression on a whole drive is difficult because turning it off "doesn't decompress the objects". However it seems to me the compact command is perfectly capable of doing so for both drives and individual folders. Could it be only marking them for decompression the next time the OS reads from them? I need to find the answer before I begin my own testing.

Read the article
SSIS Lookup component tuning tips

- by jamiet

Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Performance question: Inverting an array of pointers in-place vs array of values

- by Anders

The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence. The way I'm doing it now, using a matrix library (MTL4): For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option. Using an array of pointers instead: For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place? The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations. So the performance question boils down to this, as I see it: - The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive. - The array of pointers does not need the extra memory for matrix A containing values. - The array of pointers option does not have to copy all NxN values of A between each iteration. - The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc. Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

Read the article
Performance impact: What is the optimal payload for SqlBulkCopy.WriteToServer()?

- by Linchi Shea

For many years, I have been using a C# program to generate the TPC-C compliant data for testing. The program relies on the SqlBulkCopy class to load the data generated by the program into the SQL Server tables. In general, the performance of this C# data loader is satisfactory. Lately however, I found myself in a situation where I needed to generate a much larger amount of data than I typically do and the data needed to be loaded within a confined time frame. So I was driven to look into the code...(read more)

Read the article
Performance triage

- by Dave

Folks often ask me how to approach a suspected performance issue. My personal strategy is informed by the fact that I work on concurrency issues. (When you have a hammer everything looks like a nail, but I'll try to keep this general). A good starting point is to ask yourself if the observed performance matches your expectations. Expectations might be derived from known system performance limits, prototypes, and other software or environments that are comparable to your particular system-under-test. Some simple comparisons and microbenchmarks can be useful at this stage. It's also useful to write some very simple programs to validate some of the reported or expected system limits. Can that disk controller really tolerate and sustain 500 reads per second? To reduce the number of confounding factors it's better to try to answer that question with a very simple targeted program. And finally, nothing beats having familiarity with the technologies that underlying your particular layer. On the topic of confounding factors, as our technology stacks become deeper and less transparent, we often find our own technology working against us in some unexpected way to choke performance rather than simply running into some fundamental system limit. A good example is the warm-up time needed by just-in-time compilers in Java Virtual Machines. I won't delve too far into that particular hole except to say that it's rare to find good benchmarks and methodology for java code. Another example is power management on x86. Power management is great, but it can take a while for the CPUs to throttle up from low(er) frequencies to full throttle. And while I love "turbo" mode, it makes benchmarking applications with multiple threads a chore as you have to remember to turn it off and then back on otherwise short single-threaded runs may look abnormally fast compared to runs with higher thread counts. In general for performance characterization I disable turbo mode and fix the power governor at "performance" state. Another source of complexity is the scheduler, which I've discussed in prior blog entries. Lets say I have a running application and I want to better understand its behavior and performance. We'll presume it's warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I'll do is take a set of "pstack" samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what's actually going on inside the program. You'll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you'll get a good sense where the cycles are being spent. (I should caution that normal C/C++ inlining can diffuse an otherwise "hot" method into other methods. This is a rare instance where pstack sampling might not immediately point to the key problem). At this point you'll need to reconcile what you're seeing with pstack and your mental model of what you think the program should be doing. They're often rather different. And generally if there's a key performance issue, you'll spot it with a moderate number of samples. I'll also use OS-level observability tools to lock for the existence of bottlenecks where threads contend for locks; other situations where threads are blocked; and the distribution of threads over the system. On Solaris some good tools are mpstat and too a lesser degree, vmstat. Try running "mpstat -a 5" in one window while the application program runs concurrently. One key measure is the voluntary context switch rate "vctx" or "csw" which reflects threads descheduling themselves. It's also good to look at the user; system; and idle CPU percentages. This can give a broad but useful understanding if your threads are mostly parked or mostly running. For instance if your program makes heavy use of malloc/free, then it might be the case you're contending on the central malloc lock in the default allocator. In that case you'd see malloc calling lock in the stack traces, observe a high csw/vctx rate as threads block for the malloc lock, and your "usr" time would be less than expected. Solaris dtrace is a wonderful and invaluable performance tool as well, but in a sense you have to frame and articulate a meaningful and specific question to get a useful answer, so I tend not to use it for first-order screening of problems. It's also most effective for OS and software-level performance issues as opposed to HW-level issues. For that reason I recommend mpstat & pstack as my the 1st step in performance triage. If some other OS-level issue is evident then it's good to switch to dtrace to drill more deeply into the problem. Only after I've ruled out OS-level issues do I switch to using hardware performance counters to look for architectural impediments.

Read the article
F# performance in scientific computing

- by aaa

hello. I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching. I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are: How well does it do floating-point? Does it allow vector instructions how friendly is it towards optimizing compilers? How big a memory foot print does it have? Does it allow fine-grained control over memory locality? does it have capacity for distributed memory processors, for example Cray? what features does it have that may be of interest to computational science where heavy number processing is involved? Are there actual scientific computing implementations that use it? Thanks

Read the article
VS2010 + Resharper 5 performance issues

- by Jeremy Roberts

I have been using VS2010 with Resharper 5 for several weeks and am having a performance issue. Sometimes when typing, the cursor will lag and the keystrokes won't show instantaneously. Also, scrolling will lag at times. There is a forum thread started and JetBrains has been responding. Several people (including myself) have added their voice and uploaded some performance profiles. If anyone here has has this issue, I would encourage you to visit the thread and let JetBrains know about it. Has anyone had this problem and have a suggestion to restore performance?

Read the article
Measuring Web Page Performance on Client vs. Server

- by Yaakov Ellis

I am working with a web page (ASP.net 3.5) that is very complicated and in certain circumstances has major performance issues. It uses Ajax (through the Telerik AjaxManager) for most of its functionality. I would like to be able to measure in some way the amounts of time for the following, for each request: On client submitting request to server Client-to-Server On server initializing request On server processing request Server-to-Client Client rendering, JavaScript processing I have monitored the database traffic and cannot find any obvious culprit. On the other hand, I have a suspicion that some of the Ajax interactions are causing performance issues. However, until I have a way to track the times involved, make a baseline measurement, and measure performance as I tweak, it will be hard to work on the issue. So what is the best way to measure all of these? Is there one tool that can do it? Combination of FireBug and logging inserted into different places in the page life-cycle?

Read the article
Looking for SQL Server Performance Monitor Tools

- by the-locster

I may be approaching this problem from the wrong angle but what I'm thinking of is some kind of performance monitor tool for SQl server that works in a similar way to code performance tools, e.g. I;d like to see an output of how many times each stored procedure was called, average executuion time and possibly various resource usage stats such as cache/index utilisation, resultign disk access and table scans, etc. As far as I can tell the performance monitor that comes with SQL Server just logs the various calls but doesn't report he variosu stats I'm looking for. Potentially I just need a tool to analyze the log output?

Read the article
Performance Counters Registry validation

- by anchandra

I have a C# application that adds some performance counters when it starts up. But if the registry HKEY_LOCAL_MACHINE-SOFTWARE-Microsoft-Windows NT-CurrentVersion-Perflib is corrupted (missing or invalid data), the operation of checking the existence of the performance counters (PerformanceCounterCategory.Exists(category) takes a really long time (around 30 secs) before finally throwing exception (InvalidOperation: Category does not exist). My question is how can i verify the validity of the registry before trying to add the performance counters (and what validity means) or if there is a way i can timeout the perf counter operations, so that it doesn't take 30 seconds to get an exception.

Read the article
Entity Framework associations killing performance

- by Chris

Here is the performance test i am looking at. I have 8 different entities that are table per type. Some of the entities contain over 100 thousand rows. This particular application does several recursive calculations on the client so I think it may be best to preload the data instead of lazy loading. If there are no associations I can load the entire database in about 3 seconds. As I add associations in any way the performance starts to drastically decline. I am loading all the data the same way (just calling toList() on the entity attached to the context). I ran the test with edmx generated classes and self tracking entities and had similar results. I am sure if I were to try and deal with the associations myself, similar to how I would in a dataset, the performance problem would go away. On the other hand I am pretty sure this is not how the entity framework was intended to being used. Any thoughts or ideas?

Read the article
azure performance

- by Dave K

I've moved my app from a dedicated server to azure (and sql azure), and have noticed substantial performance degradation. obviously not having the database and web server on the same piece of hardware is much of it, but I'm curious what other people have found in migrating to azure, and if there is anything any of you would suggest I do to improve it. Right now I'm considering moving back to my dedicated server... So in summary, are there any rules of thumb for this, existing research (wasn't able to find much) or other pieces of advice on improving the performance of the app? has anyone else found the same to be true, and improved their site's performance in some way? it's built in C# on asp.net mvc 2. Thanks!

Read the article
Is code clearness killing application performance?

- by Jorge Córdoba

As today's code is getting more complex by the minute, code needs to be designed to be maintainable - meaning easy to read, and easy to understand. That being said, I can't help but remember the programs that ran a couple of years ago such as Winamp or some games in which you needed a high performance program because your 486 100 Mhz wouldn't play mp3s with that beautiful mp3 player which consumed all of your CPU cycles. Now I run Media Player (or whatever), start playing an mp3 and it eats up a 25-30% of one of my four cores. Come on!! If a 486 can do it, how can the playback take up so much processor to do the same? I'm a developer myself, and I always used to advise: keep your code simple, don't prematurely optimize for performance. It seems that we've gone from "trying to get it to use the least amount of CPU as possible" to "if it doesn't take too much CPU is all right". So, do you think we are killing performance by ignoring optimizations?

Read the article
What are some good code optimization methods?

- by esac

I would like to understand good code optimization methods and methodology. How do I keep from doing premature optimization if I am thinking about performance already. How do I find the bottlenecks in my code? How do I make sure that over time my program does not become any slower? What are some common performance errors to avoid (e.g.; I know it is bad in some languages to return while inside the catch portion of a try{} catch{} block

Read the article

Search Results

Search found 13341 results on 534 pages for 'obiee performance tuning'.

Page 22/534 | < Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >

- by James Michael Hare

- by Joe

- by shyam

- by Gaks

- by Brian Lovett

- by cletus

- by Alex R

- by Phil Koury

- by stefan-ock

- by Howard

- by Bill Paetzke

- by Christopher Galpin

- by jamiet

- by Anders

- by Linchi Shea

- by Dave

- by aaa

- by Jeremy Roberts

- by Yaakov Ellis

- by the-locster

- by anchandra

- by Chris

- by Dave K

- by Jorge Córdoba

- by esac

< Previous Page | 18 19 20 21 22 23 24 25 26 27 28 29 | Next Page >