Search Results

Search found 20275 results on 811 pages for 'general performance'.

Page 2/811 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Performance triage

- by Dave

Folks often ask me how to approach a suspected performance issue. My personal strategy is informed by the fact that I work on concurrency issues. (When you have a hammer everything looks like a nail, but I'll try to keep this general). A good starting point is to ask yourself if the observed performance matches your expectations. Expectations might be derived from known system performance limits, prototypes, and other software or environments that are comparable to your particular system-under-test. Some simple comparisons and microbenchmarks can be useful at this stage. It's also useful to write some very simple programs to validate some of the reported or expected system limits. Can that disk controller really tolerate and sustain 500 reads per second? To reduce the number of confounding factors it's better to try to answer that question with a very simple targeted program. And finally, nothing beats having familiarity with the technologies that underlying your particular layer. On the topic of confounding factors, as our technology stacks become deeper and less transparent, we often find our own technology working against us in some unexpected way to choke performance rather than simply running into some fundamental system limit. A good example is the warm-up time needed by just-in-time compilers in Java Virtual Machines. I won't delve too far into that particular hole except to say that it's rare to find good benchmarks and methodology for java code. Another example is power management on x86. Power management is great, but it can take a while for the CPUs to throttle up from low(er) frequencies to full throttle. And while I love "turbo" mode, it makes benchmarking applications with multiple threads a chore as you have to remember to turn it off and then back on otherwise short single-threaded runs may look abnormally fast compared to runs with higher thread counts. In general for performance characterization I disable turbo mode and fix the power governor at "performance" state. Another source of complexity is the scheduler, which I've discussed in prior blog entries. Lets say I have a running application and I want to better understand its behavior and performance. We'll presume it's warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I'll do is take a set of "pstack" samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what's actually going on inside the program. You'll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you'll get a good sense where the cycles are being spent. (I should caution that normal C/C++ inlining can diffuse an otherwise "hot" method into other methods. This is a rare instance where pstack sampling might not immediately point to the key problem). At this point you'll need to reconcile what you're seeing with pstack and your mental model of what you think the program should be doing. They're often rather different. And generally if there's a key performance issue, you'll spot it with a moderate number of samples. I'll also use OS-level observability tools to lock for the existence of bottlenecks where threads contend for locks; other situations where threads are blocked; and the distribution of threads over the system. On Solaris some good tools are mpstat and too a lesser degree, vmstat. Try running "mpstat -a 5" in one window while the application program runs concurrently. One key measure is the voluntary context switch rate "vctx" or "csw" which reflects threads descheduling themselves. It's also good to look at the user; system; and idle CPU percentages. This can give a broad but useful understanding if your threads are mostly parked or mostly running. For instance if your program makes heavy use of malloc/free, then it might be the case you're contending on the central malloc lock in the default allocator. In that case you'd see malloc calling lock in the stack traces, observe a high csw/vctx rate as threads block for the malloc lock, and your "usr" time would be less than expected. Solaris dtrace is a wonderful and invaluable performance tool as well, but in a sense you have to frame and articulate a meaningful and specific question to get a useful answer, so I tend not to use it for first-order screening of problems. It's also most effective for OS and software-level performance issues as opposed to HW-level issues. For that reason I recommend mpstat & pstack as my the 1st step in performance triage. If some other OS-level issue is evident then it's good to switch to dtrace to drill more deeply into the problem. Only after I've ruled out OS-level issues do I switch to using hardware performance counters to look for architectural impediments.

Read the article
Performance analytics via DBMS "plugins", or other solution

- by Polynomial

I'm working on a systems monitoring product that currently focuses on performance at the system level. We're expanding out to monitoring database systems. Right now we can fetch simple performance information from a selection of DBMS, like connection count, disk IO rates, lock wait times, etc. However, we'd really like a way to measure the execution time of every query going into a DBMS, without requiring the client to implement monitoring in their application code. Some potential solutions might be: Some sort of proxy that sits between client and server. SSL might be an issue here, plus it requires us to reverse engineer and implement the network protocol for each DBMS. Plugin for each DBMS system that automatically records performance information when a query comes in. Other problems include "anonymising" the SQL, i.e. taking something like SELECT * FROM products WHERE price > 20 AND name LIKE "%disk%" and producing SELECT * FROM products WHERE price > ? AND name LIKE "%?%", though this shouldn't be too difficult with some clever parsing and regex. We're mainly focusing on: MySQL MSSQL Oracle Redis mongodb memcached Are there any plugin-style mechanisms we can utilise for any of these? Or is there a simpler solution?

Read the article
FreeBSD performance tuning. Sysctls, loader.conf, kernel

- by SaveTheRbtz

I wanted to share knowledge of tuning FreeBSD via sysctl.conf/loader.conf/KENCONF. It was initially based on Igor Sysoev's (author of nginx) presentation about FreeBSD tuning up to 100,000-200,000 active connections. Tunings are for FreeBSD-CURRENT. Since 7.2 amd64 some of them are tuned well by default. Prior 7.0 some of them are boot only (set via /boot/loader.conf) or does not exist at all. sysctl.conf: # No zero mapping feature # May break wine # (There are also reports about broken samba3) #security.bsd.map_at_zero=0 # If you have really busy webserver with apache13 you may run out of processes #kern.maxproc=10000 # Same for servers with apache2 / Pound #kern.threads.max_threads_per_proc=4096 # Max. backlog size kern.ipc.somaxconn=4096 # Shared memory // 7.2+ can use shared memory > 2Gb kern.ipc.shmmax=2147483648 # Sockets kern.ipc.maxsockets=204800 # Can cause this on older kernels: # http://old.nabble.com/Significant-performance-regression-for-increased-maxsockbuf-on-8.0-RELEASE-tt26745981.html#a26745981 ) kern.ipc.maxsockbuf=10485760 # Mbuf 2k clusters (on amd64 7.2+ 25600 is default) # For such high value vm.kmem_size must be increased to 3G kern.ipc.nmbclusters=262144 # Jumbo pagesize(_SC_PAGESIZE) clusters # Used as general packet storage for jumbo frames # can be monitored via `netstat -m` #kern.ipc.nmbjumbop=262144 # Jumbo 9k/16k clusters # If you are using them #kern.ipc.nmbjumbo9=65536 #kern.ipc.nmbjumbo16=32768 # For lower latency you can decrease scheduler's maximum time slice # default: stathz/10 (~ 13) #kern.sched.slice=1 # Increase max command-line length showed in `ps` (e.g for Tomcat/Java) # Default is PAGE_SIZE / 16 or 256 on x86 # This avoids commands to be presented as [executable] in `ps` # For more info see: http://www.freebsd.org/cgi/query-pr.cgi?pr=120749 kern.ps_arg_cache_limit=4096 # Every socket is a file, so increase them kern.maxfiles=204800 kern.maxfilesperproc=200000 kern.maxvnodes=200000 # On some systems HPET is almost 2 times faster than default ACPI-fast # Useful on systems with lots of clock_gettime / gettimeofday calls # See http://old.nabble.com/ACPI-fast-default-timecounter,-but-HPET-83--faster-td23248172.html # After revision 222222 HPET became default: http://svnweb.freebsd.org/base?view=revision&revision=222222 kern.timecounter.hardware=HPET # Small receive space, only usable on http-server, on file server this # should be increased to 65535 or even more #net.inet.tcp.recvspace=8192 # This is useful on Fat-Long-Pipes #net.inet.tcp.recvbuf_max=10485760 #net.inet.tcp.recvbuf_inc=65535 # Small send space is useful for http servers that serve small files # Autotuned since 7.x net.inet.tcp.sendspace=16384 # This is useful on Fat-Long-Pipes #net.inet.tcp.sendbuf_max=10485760 #net.inet.tcp.sendbuf_inc=65535 # Turn off receive autotuning # You can play with it. #net.inet.tcp.recvbuf_auto=0 #net.inet.tcp.sendbuf_auto=0 # This should be enabled if you going to use big spaces (>64k) # Also timestamp field is useful when using syncookies net.inet.tcp.rfc1323=1 # Turn this off on high-speed, lossless connections (LAN 1Gbit+) # If you set it there is no need in TCP_NODELAY sockopt (see man tcp) net.inet.tcp.delayed_ack=0 # This feature is useful if you are serving data over modems, Gigabit Ethernet, # or even high speed WAN links (or any other link with a high bandwidth delay product), # especially if you are also using window scaling or have configured a large send window. # Automatically disables on small RTT ( http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_subr.c?#rev1.237 ) # This sysctl was removed in 10-CURRENT: # See: http://www.mail-archive.com/[email protected]/msg06178.html #net.inet.tcp.inflight.enable=0 # TCP slowstart algorithm tunings # We assuming we have very fast clients #net.inet.tcp.slowstart_flightsize=100 #net.inet.tcp.local_slowstart_flightsize=100 # Disable randomizing of ports to avoid false RST # Before usage check SA here www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf # (it's also says that port randomization auto-disables at some conn.rates, but I didn't checked it thou) #net.inet.ip.portrange.randomized=0 # Increase portrange # For outgoing connections only. Good for seed-boxes and ftp servers. net.inet.ip.portrange.first=1024 net.inet.ip.portrange.last=65535 # # stops route cache degregation during a high-bandwidth flood # http://www.freebsd.org/doc/en/books/handbook/securing-freebsd.html #net.inet.ip.rtexpire=2 net.inet.ip.rtminexpire=2 net.inet.ip.rtmaxcache=1024 # Security net.inet.ip.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.icmp.maskrepl=0 net.inet.icmp.log_redirect=0 net.inet.icmp.drop_redirect=1 net.inet.tcp.drop_synfin=1 # # There is also good example of sysctl.conf with comments: # http://www.thern.org/projects/sysctl.conf # # icmp may NOT rst, helpful for those pesky spoofed # icmp/udp floods that end up taking up your outgoing # bandwidth/ifqueue due to all that outgoing RST traffic. # #net.inet.tcp.icmp_may_rst=0 # Security net.inet.udp.blackhole=1 net.inet.tcp.blackhole=2 # IPv6 Security # For more info see http://www.fosslc.org/drupal/content/security-implications-ipv6 # Disable Node info replies # To see this vulnerability in action run `ping6 -a sglAac ::1` or `ping6 -w ::1` on unprotected node net.inet6.icmp6.nodeinfo=0 # Turn on IPv6 privacy extensions # For more info see proposal http://unix.derkeiler.com/Mailing-Lists/FreeBSD/net/2008-06/msg00103.html net.inet6.ip6.use_tempaddr=1 net.inet6.ip6.prefer_tempaddr=1 # Disable ICMP redirect net.inet6.icmp6.rediraccept=0 # Disable acceptation of RA and auto linklocal generation if you don't use them #net.inet6.ip6.accept_rtadv=0 #net.inet6.ip6.auto_linklocal=0 # Increases default TTL, sometimes useful # Default is 64 net.inet.ip.ttl=128 # Lessen max segment life to conserve resources # ACK waiting time in miliseconds # (default: 30000. RFC from 1979 recommends 120000) net.inet.tcp.msl=5000 # Max bumber of timewait sockets net.inet.tcp.maxtcptw=200000 # Don't use tw on local connections # As of 15 Apr 2009. Igor Sysoev says that nolocaltimewait has some buggy realization. # So disable it or now till get fixed #net.inet.tcp.nolocaltimewait=1 # FIN_WAIT_2 state fast recycle net.inet.tcp.fast_finwait2_recycle=1 # Time before tcp keepalive probe is sent # default is 2 hours (7200000) #net.inet.tcp.keepidle=60000 # Should be increased until net.inet.ip.intr_queue_drops is zero net.inet.ip.intr_queue_maxlen=4096 # Interrupt handling via multiple CPU, but with context switch. # You can play with it. Default is 1; #net.isr.direct=0 # This is for routers only #net.inet.ip.forwarding=1 #net.inet.ip.fastforwarding=1 # This speed ups dummynet when channel isn't saturated net.inet.ip.dummynet.io_fast=1 # Increase dummynet(4) hash #net.inet.ip.dummynet.hash_size=2048 #net.inet.ip.dummynet.max_chain_len # Should be increased when you have A LOT of files on server # (Increase until vfs.ufs.dirhash_mem becomes lower) vfs.ufs.dirhash_maxmem=67108864 # Note from commit http://svn.freebsd.org/base/head@211031 : # For systems with RAID volumes and/or virtualization envirnments, where # read performance is very important, increasing this sysctl tunable to 32 # or even more will demonstratively yield additional performance benefits. vfs.read_max=32 # Explicit Congestion Notification (see http://en.wikipedia.org/wiki/Explicit_Congestion_Notification) net.inet.tcp.ecn.enable=1 # Flowtable - flow caching mechanism # Useful for routers #net.inet.flowtable.enable=1 #net.inet.flowtable.nmbflows=65535 # Extreme polling tuning #kern.polling.burst_max=1000 #kern.polling.each_burst=1000 #kern.polling.reg_frac=100 #kern.polling.user_frac=1 #kern.polling.idle_poll=0 # IPFW dynamic rules and timeouts tuning # Increase dyn_buckets till net.inet.ip.fw.curr_dyn_buckets is lower net.inet.ip.fw.dyn_buckets=65536 net.inet.ip.fw.dyn_max=65536 net.inet.ip.fw.dyn_ack_lifetime=120 net.inet.ip.fw.dyn_syn_lifetime=10 net.inet.ip.fw.dyn_fin_lifetime=2 net.inet.ip.fw.dyn_short_lifetime=10 # Make packets pass firewall only once when using dummynet # i.e. packets going thru pipe are passing out from firewall with accept #net.inet.ip.fw.one_pass=1 # shm_use_phys Wires all shared pages, making them unswappable # Use this to lessen Virtual Memory Manager's work when using Shared Mem. # Useful for databases #kern.ipc.shm_use_phys=1 # ZFS # Enable prefetch. Useful for sequential load type i.e fileserver. # FreeBSD sets vfs.zfs.prefetch_disable to 1 on any i386 systems and # on any amd64 systems with less than 4GB of avaiable memory # For additional info check this nabble thread http://old.nabble.com/Samba-read-speed-performance-tuning-td27964534.html #vfs.zfs.prefetch_disable=0 # On highload servers you may notice following message in dmesg: # "Approaching the limit on PV entries, consider increasing either the # vm.pmap.shpgperproc or the vm.pmap.pv_entry_max tunable" vm.pmap.shpgperproc=2048 loader.conf: # Accept filters for data, http and DNS requests # Useful when your software uses select() instead of kevent/kqueue or when you under DDoS # DNS accf available on 8.0+ accf_data_load="YES" accf_http_load="YES" accf_dns_load="YES" # Async IO system calls aio_load="YES" # Linux specific devices in /dev # As for 8.1 it only /dev/full #lindev_load="YES" # Adds NCQ support in FreeBSD # WARNING! all ad[0-9]+ devices will be renamed to ada[0-9]+ # 8.0+ only #ahci_load="YES" #siis_load="YES" # FreeBSD 8.2+ # New Congestion Control for FreeBSD # http://caia.swin.edu.au/urp/newtcp/tools/cc_chd-readme-0.1.txt # http://www.ietf.org/proceedings/78/slides/iccrg-5.pdf # Initial merge commit message http://www.mail-archive.com/[email protected]/msg31410.html #cc_chd_load="YES" # Increase kernel memory size to 3G. # # Use ONLY if you have KVA_PAGES in kernel configuration, and you have more than 3G RAM # Otherwise panic will happen on next reboot! # # It's required for high buffer sizes: kern.ipc.nmbjumbop, kern.ipc.nmbclusters, etc # Useful on highload stateful firewalls, proxies or ZFS fileservers # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #vm.kmem_size="3G" # If your server has lots of swap (>4Gb) you should increase following value # according to http://lists.freebsd.org/pipermail/freebsd-hackers/2009-October/029616.html # Otherwise you'll be getting errors # "kernel: swap zone exhausted, increase kern.maxswzone" # kern.maxswzone="256M" # Older versions of FreeBSD can't tune maxfiles on the fly #kern.maxfiles="200000" # Useful for databases # Sets maximum data size to 1G # (FreeBSD 7.2+ amd64 users: Check that current value is lower!) #kern.maxdsiz="1G" # Maximum buffer size(vfs.maxbufspace) # You can check current one via vfs.bufspace # Should be lowered/upped depending on server's load-type # Usually decreased to preserve kmem # (default is 10% of mem) #kern.maxbcache="512M" # Sendfile buffers # For i386 only #kern.ipc.nsfbufs=10240 # FreeBSD 9+ # HPET "legacy route" support. It should allow HPET to work per-CPU # See http://www.mail-archive.com/[email protected]/msg03603.html #hint.atrtc.0.clock=0 #hint.attimer.0.clock=0 #hint.hpet.0.legacy_route=1 # syncache Hash table tuning net.inet.tcp.syncache.hashsize=1024 net.inet.tcp.syncache.bucketlimit=512 net.inet.tcp.syncache.cachelimit=65536 # Increased hostcache # Later host cache can be viewed via net.inet.tcp.hostcache.list hidden sysctl # Very useful for it's RTT RTTVAR # Must be power of two net.inet.tcp.hostcache.hashsize=65536 # hashsize * bucketlimit (which is 30 by default) # It allocates 255Mb (1966080*136) of RAM net.inet.tcp.hostcache.cachelimit=1966080 # TCP control-block Hash table tuning net.inet.tcp.tcbhashsize=4096 # Disable ipfw deny all # Should be uncommented when there is a chance that # kernel and ipfw binary may be out-of sync on next reboot #net.inet.ip.fw.default_to_accept=1 # # SIFTR (Statistical Information For TCP Research) is a kernel module that # logs a range of statistics on active TCP connections to a log file. # See prerelease notes http://groups.google.com/group/mailing.freebsd.current/browse_thread/thread/b4c18be6cdce76e4 # and man 4 sitfr #siftr_load="YES" # Enable superpages, for 7.2+ only # Also read http://lists.freebsd.org/pipermail/freebsd-hackers/2009-November/030094.html vm.pmap.pg_ps_enabled=1 # Usefull if you are using Intel-Gigabit NIC #hw.em.rxd=4096 #hw.em.txd=4096 #hw.em.rx_process_limit="-1" # Also if you have ALOT interrupts on NIC - play with following parameters # NOTE: You should set them for every NIC #dev.em.0.rx_int_delay: 250 #dev.em.0.tx_int_delay: 250 #dev.em.0.rx_abs_int_delay: 250 #dev.em.0.tx_abs_int_delay: 250 # There is also multithreaded version of em/igb drivers can be found here: # http://people.yandex-team.ru/~wawa/ # # for additional em monitoring and statistics use # sysctl dev.em.0.stats=1 ; dmesg # sysctl dev.em.0.debug=1 ; dmesg # Also after r209242 (-CURRENT) there is a separate sysctl for each stat variable; # Same tunings for igb #hw.igb.rxd=4096 #hw.igb.txd=4096 #hw.igb.rx_process_limit=100 # Some useful netisr tunables. See sysctl net.isr #net.isr.maxthreads=4 #net.isr.defaultqlimit=4096 #net.isr.maxqlimit: 10240 # Bind netisr threads to CPUs #net.isr.bindthreads=1 # # FreeBSD 9.x+ # Increase interface send queue length # See commit message http://svn.freebsd.org/viewvc/base?view=revision&revision=207554 #net.link.ifqmaxlen=1024 # Nicer boot logo =) loader_logo="beastie" And finally here is KERNCONF: # Just some of them, see also # cat /sys/{i386,amd64,}/conf/NOTES # This one useful only on i386 #options KVA_PAGES=512 # You can play with HZ in environments with high interrupt rate (default is 1000) # 100 is for my notebook to prolong it's battery life #options HZ=100 # Polling is goot on network loads with high packet rates and low-end NICs # NB! Do not enable it if you want more than one netisr thread #options DEVICE_POLLING # Eliminate datacopy on socket read-write # To take advantage with zero copy sockets you should have an MTU >= 4k # This req. is only for receiving data. # Read more in man zero_copy_sockets # Also this epic thread on kernel trap: # http://kerneltrap.org/node/6506 # Here Linus says that "anybody that does it that way (FreeBSD) is totally incompetent" #options ZERO_COPY_SOCKETS # Support TCP sign. Used for IPSec options TCP_SIGNATURE # There was stackoverflow found in KAME IPSec stack: # See http://secunia.com/advisories/43995/ # For quick workaround you can use `ipfw add deny proto ipcomp` options IPSEC # This ones can be loaded as modules. They described in loader.conf section #options ACCEPT_FILTER_DATA #options ACCEPT_FILTER_HTTP # Adding ipfw, also can be loaded as modules options IPFIREWALL # On 8.1+ you can disable verbose to see blocked packets on ipfw0 interface. # Also there is no point in compiling verbose into the kernel, because # now there is net.inet.ip.fw.verbose tunable. #options IPFIREWALL_VERBOSE #options IPFIREWALL_VERBOSE_LIMIT=10 options IPFIREWALL_FORWARD # Adding kernel NAT options IPFIREWALL_NAT options LIBALIAS # Traffic shaping options DUMMYNET # Divert, i.e. for userspace NAT options IPDIVERT # This is for OpenBSD's pf firewall device pf device pflog # pf's QoS - ALTQ options ALTQ options ALTQ_CBQ # Class Bases Queuing (CBQ) options ALTQ_RED # Random Early Detection (RED) options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler (HFSC) options ALTQ_PRIQ # Priority Queuing (PRIQ) options ALTQ_NOPCC # Required for SMP build # Pretty console # Manual can be found here http://forums.freebsd.org/showthread.php?t=6134 #options VESA #options SC_PIXEL_MODE # Disable reboot on Ctrl Alt Del #options SC_DISABLE_REBOOT # Change normal|kernel messages color options SC_NORM_ATTR=(FG_GREEN|BG_BLACK) options SC_KERNEL_CONS_ATTR=(FG_YELLOW|BG_BLACK) # More scroll space options SC_HISTORY_SIZE=8192 # Adding hardware crypto device device crypto device cryptodev # Useful network interfaces device vlan device tap #Virtual Ethernet driver device gre #IP over IP tunneling device if_bridge #Bridge interface device pfsync #synchronization interface for PF device carp #Common Address Redundancy Protocol device enc #IPsec interface device lagg #Link aggregation interface device stf #IPv4-IPv6 port # Also for my notebook, but may be used with Opteron device amdtemp # Same for Intel processors device coretemp # man 4 cpuctl device cpuctl # CPU control pseudo-device # Support for ECMP. More than one route for destination # Works even with default route so one can use it as LB for two ISP # For now code is unstable and panics (panic: rtfree 2) on route deletions. #options RADIX_MPATH # Multicast routing #options MROUTING #options PIM # Debug & DTrace options KDB # Kernel debugger related code options KDB_TRACE # Print a stack trace for a panic options KDTRACE_FRAME # amd64-only(?) options KDTRACE_HOOKS # all architectures - enable general DTrace hooks #options DDB #options DDB_CTF # all architectures - kernel ELF linker loads CTF data # Adaptive spining in lockmgr (8.x+) # See http://www.mail-archive.com/[email protected]/msg10782.html options ADAPTIVE_LOCKMGRS # UTF-8 in console (8.x+) #options TEKEN_UTF8 # FreeBSD 8.1+ # Deadlock resolver thread # For additional information see http://www.mail-archive.com/[email protected]/msg18124.html # (FYI: "resolution" is panic so use with caution) #options DEADLKRES # Increase maximum size of Raw I/O and sendfile(2) readahead #options MAXPHYS=(1024*1024) #options MAXBSIZE=(1024*1024) # For scheduler debug enable following option. # Debug will be available via `kern.sched.stats` sysctl # For more information see http://svnweb.freebsd.org/base/head/sys/conf/NOTES?view=markup #options SCHED_STATS If you are tuning network for maximum performance you may wish to play with ifconfig options like: # You can list all capabilities via `ifconfig -m` ifconfig [-]rxcsum [-]txcsum [-]tso [-]lro mtu In case you've enabled DDB in kernel config, you should edit your /etc/ddb.conf and add something like this to enable automatic reboot (and textdump as bonus): script kdb.enter.panic=textdump set; capture on; show pcpu; bt; ps; alltrace; capture off; call doadump; reset script kdb.enter.default=textdump set; capture on; bt; ps; capture off; call doadump; reset And do not forget to add ddb_enable="YES" to /etc/rc.conf Since FreeBSD 9 you can select to enable/disable flowcontrol on your NIC: # See http://en.wikipedia.org/wiki/Ethernet_flow_control and # http://www.mail-archive.com/[email protected]/msg07927.html for additional info ifconfig bge0 media auto mediaopt flowcontrol PS. Also most of FreeBSD's limits can be monitored by # vmstat -z and # limits PPS. variety of network counters can be monitored via # netstat -s In FreeBSD-9 netstat's -Q option appeared, try following command to display netisr stats # netstat -Q PPPS. also see # man 7 tuning PPPPS. I wanted to thank FreeBSD community, especially author of nginx - Igor Sysoev, nginx-ru@ and FreeBSD-performance@ mailing lists for providing useful information about FreeBSD tuning. FreeBSD WIP * Whats cooking for FreeBSD 7? * Whats cooking for FreeBSD 8? * Whats cooking for FreeBSD 9? So here is the question: What tunings are you using on yours FreeBSD servers? You can also post your /etc/sysctl.conf, /boot/loader.conf, kernel options, etc with description of its' meaning (do not copy-paste from sysctl -d). Don't forget to specify server type (web, smb, gateway, etc) Let's share experience!

Read the article
SQL SERVER – Database Dynamic Caching by Automatic SQL Server Performance Acceleration

- by pinaldave

My second look at SafePeak’s new version (2.1) revealed to me few additional interesting features. For those of you who hadn’t read my previous reviews SafePeak and not familiar with it, here is a quick brief: SafePeak is in business of accelerating performance of SQL Server applications, as well as their scalability, without making code changes to the applications or to the databases. SafePeak performs database dynamic caching, by caching in memory result sets of queries and stored procedures while keeping all those cache correct and up to date. Cached queries are retrieved from the SafePeak RAM in microsecond speed and not send to the SQL Server. The application gets much faster results (100-500 micro seconds), the load on the SQL Server is reduced (less CPU and IO) and the application or the infrastructure gets better scalability. SafePeak solution is hosted either within your cloud servers, hosted servers or your enterprise servers, as part of the application architecture. Connection of the application is done via change of connection strings or adding reroute line in the c:\windows\system32\drivers\etc\hosts file on all application servers. For those who would like to learn more on SafePeak architecture and how it works, I suggest to read this vendor’s webpage: SafePeak Architecture. More interesting new features in SafePeak 2.1 In my previous review of SafePeak new I covered the first 4 things I noticed in the new SafePeak (check out my article “SQLAuthority News – SafePeak Releases a Major Update: SafePeak version 2.1 for SQL Server Performance Acceleration”): Cache setup and fine-tuning – a critical part for getting good caching results Database templates Choosing which database to cache Monitoring and analysis options by SafePeak Since then I had a chance to play with SafePeak some more and here is what I found. 5. Analysis of SQL Performance (present and history): In SafePeak v.2.1 the tools for understanding of performance became more comprehensive. Every 15 minutes SafePeak creates and updates various performance statistics. Each query (or a procedure execute) that arrives to SafePeak gets a SQL pattern, and after it is used again there are statistics for such pattern. An important part of this product is that it understands the dependencies of every pattern (list of tables, views, user defined functions and procs). From this understanding SafePeak creates important analysis information on performance of every object: response time from the database, response time from SafePeak cache, average response time, percent of traffic and break down of behavior. One of the interesting things this behavior column shows is how often the object is actually pdated. The break down analysis allows knowing the above information for: queries and procedures, tables, views, databases and even instances level. The data is show now on all arriving queries, both read queries (that can be cached), but also any types of updates like DMLs, DDLs, DCLs, and even session settings queries. The stats are being updated every 15 minutes and SafePeak dashboard allows going back in time and investigating what happened within any time frame. 6. Logon trigger, for making sure nothing corrupts SafePeak cache data If you have an application with many parts, many servers many possible locations that can actually update the database, or the SQL Server is accessible to many DBAs or software engineers, each can access some database directly and do some changes without going thru SafePeak – this can create a potential corruption of the data stored in SafePeak cache. To make sure SafePeak cache is correct it needs to get all updates to arrive to SafePeak, and if a DBA will access the database directly and do some changes, for example, then SafePeak will simply not know about it and will not clean SafePeak cache. In the new version, SafePeak brought a new feature called “Logon Trigger” to solve the above challenge. By special click of a button SafePeak can deploy a special server logon trigger (with a CLR object) on your SQL Server that actually monitors all connections and informs SafePeak on any connection that is coming not from SafePeak. In SafePeak dashboard there is an interface that allows to control which logins can be ignored based on login names and IPs, while the rest will invoke cache cleanup of SafePeak and actually locks SafePeak cache until this connection will not be closed. Important to note, that this does not interrupt any logins, only informs SafePeak on such connection. On the Dashboard screen in SafePeak you will be able to see those connections and then decide what to do with them. Configuration of this feature in SafePeak dashboard can be done here: Settings -> SQL instances management -> click on instance -> Logon Trigger tab. Other features: 7. User management ability to grant permissions to someone without changing its configuration and only use SafePeak as performance analysis tool. 8. Better reports for analysis of performance using 15 minute resolution charts. 9. Caching of client cursors 10. Support for IPv6 Summary SafePeak is a great SQL Server performance acceleration solution for users who want immediate results for sites with performance, scalability and peak spikes challenges. Especially if your apps are packaged or 3rd party, since no code changes are done. SafePeak can significantly increase response times, by reducing network roundtrip to the database, decreasing CPU resource usage, eliminating I/O and storage access. SafePeak team provides a free fully functional trial www.safepeak.com/download and actually provides a one-on-one assistance during such trial. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: About Me, Pinal Dave, PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology

Read the article
Windows performance monitor new instances

- by fborozan

Hi all, I am trying to configure performance monitor on 2003/2008R1&R2 to capture new instances of the counters without any luck. For example if I select counter Process\%Processor time (to monitor processor time per any instances of the process) everything works fine until I open or close any application. If in the meanwhile new application is open it will not be included in the monitoring processor, and old application instance will display zero for % processor time. The problem is performance monitor is not refreshing instances of the new applications/users/new terminal session/ or any other metrics that changes instances in the meanwhile. The solution is to stop/start log file, but I don't want to do that every sec and the logging will be split into two files. Anybody knows how do I accomplish to add all new instances? Any help greatly appreciated

Read the article
Recommendation using Client side performance monitoring (boomerang/jiffy/episodes)

- by Yasei No Umi

There are a few Client-side JavaScript libraries that check web-site performance on the client side: Jiffy (http://code.google.com/p/jiffy-web/) Episodes (http://stevesouders.com/episodes/) by Steve Sounders Boomerang (http://yahoo.github.com/boomerang/doc/) by Yahoo! Have you used any of them or a similar too? What did you use for the server-side? for reporting? Is this a recommended approach? If not, how should I monitor my web-site performance from the end-user's view?

Read the article
Very poor read performance compared to write performance on md(raid1) / crypt(luks) / lvm

- by Android5360

I'm experiencing very poor read performance over raid1/crypt/lvm. In the same time, write speeds are about 2x+ faster on the same setup. On another raid1 setup on the same machine I get normal read speeds (maybe because I'm not using cryptsetup). OS related disks: sda + sdb. I have raid1 configuration with two disks, both are in place. I'm using LVM over the RAID. No encryption. Both disks are WD Green, 5400 rpm. IO test results on this raid1: dd if=/dev/zero of=/tmp/output.img3 bs=8k count=256k conv=fsync - 2147483648 bytes (2.1 GB) copied, 22.3392 s, 96.1 MB/s sync echo 3 > /proc/sys/vm/drop_caches dd if=/tmp/output.img3 of=/dev/null bs=8k - 2147483648 bytes (2.1 GB) copied, 15.9 s, 135 MB/s And here is the problematic setup (on the same machine). Currently I have only one sdc (WD Green, 5400rpm) configured in software raid1 + crypt (luks, serpent-xts-plain) + lvm. Tomorrow I will attach another disk (sdd) to complete this two-disk raid1 setup. IO tests results on this raid1: dd if=/dev/zero of=output.img3 bs=8k count=256k conv=fsync 2147483648 bytes (2.1 GB) copied, 17.7235 s, 121 MB/s sync echo 3 > /proc/sys/vm/drop_caches dd if=output.img3 of=/dev/null bs=8k 2147483648 bytes (2.1 GB) copied, 36.2454 s, 59.2 MB/s We can see that the read performance is very very bad (59MB/s compared to 135MB/s when using no encryption). Nothing is using the disks during benchmark. I can confirm this because I checked with iostat and dstat. Details on the hardware: disks: all are WD green, 5400rpm, 64mb cache. cpu: FX-8350 at stock speed ram: 4x4GB at 1066Mhz. Details on the software: OS: Debian Wheezy 7, amd64 mdadm: v3.2.5 - 18th May 2012 LVM version: 2.02.95(2) (2012-03-06) LVM Library version: 1.02.74 (2012-03-06) LVM Driver version: 4.22.0 cryptsetup: 1.4.3 Here is how I configured the slow raid1+crypt+lvm setup: parted /dev/sdc mklabel gpt type: ext4 start: 2048s end: -1 Now the raid, crypt and the lvm configuration: mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/sdc cryptsetup --cipher serpent-xts-plain luksFormat /dev/md1 cryptsetup luksOpen /dev/md1 md1_crypt vgcreate vg_sql /dev/mapper/md1_crypt lvcreate -l 100%VG vg_sql -n lv_sql mkfs.ext4 /dev/mapper/vg_sql-lv-sql mount /dev/mapper/vg_sql-lv_sql /sql So guys, can you help me identify the reason and fix it? It has to be something with the cryptsetup as there is no such read slowdown on the other setup (sda+sdb) where no encryption is present. But I have no idea what to do. Thanks!

Read the article
performance monitor in iis 7 to monitor which website is using most resources (asp.net)

- by Karl Cassar

I am using Windows Server 2008 R2 and IIS 7.5, and am hosting multiple websites on the same webserver. Is it possible to use Performance Monitor to know on average which website is using the most resources? I've added a user-defined Data Collector Set in Performance Monitor collecting data for 1 day. However, I could not find any details which hint which website is using the most resources. Which counters are crucial to monitor websites? The generated report tells me that the top process is w3wp##1 - how can I know which website it corresponds to? I've also tried to add counters for ASP.Net Applications for all object instances, however % Managed Processor Time (estimated) is 0 at all times.

Read the article
SQL SERVER – Iridium I/O – SQL Server Deduplication that Shrinks Databases and Improves Performance

- by Pinal Dave

Database performance is a common problem for SQL Server DBA’s. It seems like we spend more time on performance than just about anything else. In many cases, we use scripts or tools that point out performance bottlenecks but we don’t have any way to fix them. For example, what do you do when you need to speed up a query that is already tuned as well as possible? Or what do you do when you aren’t allowed to make changes for a database supporting a purchased application? Iridium I/O for SQL Server was originally built at Confio software (makers of Ignite) because DBA’s kept asking for a way to actually fix performance instead of just pointing out performance problems. The technology is certified by Microsoft and was so promising that it was spun out into a separate company that is now run by the Confio Founder/CEO and technology management team. Iridium uses deduplication technology to both shrink the databases as well as boost IO performance. It is intriguing to see it work. It will deduplicate a live database as it is running transactions. You can watch the database get smaller while user queries are running. Iridium is a simple tool to use. After installing the software, you click an “Analyze” button which will spend a minute or two on each database and estimate both your storage and performance savings. Next, you click an “Activate” button to turn on Iridium I/O for your selected databases. You don’t need to reboot the operating system or restart the database during any part of the process. As part of my test, I also wanted to see if there would be an impact on my databases when Iridium was removed. The ‘revert’ process (bringing the files back to their SQL Server native format) was executed by a simple click of a button, and completed while the databases were available for normal processing. I was impressed and enjoyed playing with the software and encourage all of you to try it out. Here is the link to the website to download Iridium for free. . Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
EPM 11.1.2 - In WebLogic Server, Enable Native IO Performance Pack

- by Ahmed Awan

Performance can be improved by enabling native IO in production mode. WebLogic Server benchmarks show major performance improvements when native performance packs are used on machines that host Oracle WebLogic Server instances. Important Note: Always enable native I/O, if available, and check for errors at startup to make sure it is being initialed properly. Tip: The use of NATIVE performance packs are enabled by default in the configuration shipped with your distribution. You can use the Administration Console to verify that performance packs are enabled by clicking on each managed server and click on Tuning tab.

Read the article
How to collect the performance data of a server during an unreachable/down period using Nagios?

- by gsc-frank

Some time services and host stop responding due to a poor server performance. I mean, if for some reason (could be lot of concurrency services access, a expensive backup execution on the server or whatever that consume tons of server resources) a server performance is very degraded, that could lead that the server isn't capable to establish any "normal network communication" (without trigger whatever standards timeouts defined for such communication). Knowing host's performance data (cpu, memory, ...) in case of available during that period (host is not down and despite of its performance degradation still allow plugins collect performance data) could be very useful for sysadmin to try to determine what cause the problem, or at least, if the host performance was good and don't interfered at all in the host/service down. This problem could be solved using remote active (NRPE) or remote passive (NSCA) if such remote solutions could store (buffered) perf data to be send to central Nagios server when host performance or network outage allow it. I read the doc of both solutions and can't find any reference to such buffer mechanism neither what happened in case that NSCA can't reach Nagios server. Any idea of how solve this lack of info? so useful for forensic analysis. EDIT: My questions isn about which tools I can use to debug perf problems or gather perf data to analysis, but is about how collect (using Nagios) host perf data even during a network outage for its posterior analysis (kind of forensic analysis). The idea is integrate such data to Nagios graphers like pnp4nagios and NagiosGrapther. I know that I could install tools like Cacti in each of my host, and have a kind of performance data collection redundancy, but I really want avoid that and try to solve all perf analysis requirements with one tools: Nagios

Read the article
Postfix performance

- by Brian G

Running postfix on ubuntu, sending alot of mail ( ~ 1 million messages ) per day. loads are extremly high but not much in terms of cpu and memory load. Anyone in a similiar situation and know how to remove the bottleneck? All mail on this server is outbound. I would have to assume the bottleneck is disk. Just an update, here is what iostat looks like: avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.12 99.88 0.00 0.00 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 12.38 0.00 2.48 0.00 118.81 48.00 0.00 0.00 0.00 0.00 sdb 1.49 22.28 72.28 42.57 629.70 1041.58 14.55 135.56 834.31 8.71 100.00 Are these numbers in line with the performance you would expect from a single disk? sdb is dedicated to postfix. I think it is queue shuffling, from incoming-active-deferred More details from questions: Server: Quad core Xeon(R) CPU E5405 @ 2.00GH with 4 GB ram Load average: 464.88, 489.11, 483.91, 4 cores. but the memory utilization and cpu is minimal Postfix instances between 16 - 32

Read the article
Increase application performance on Amazon AWS

- by Honus Wagner

I've got a client with an MVC v1 (.NET) application running on a micro instance. On this instance, I've got .NET, IIS 7.5, and MS SQL Server 2008 running to handle the application. The client has reported that it is taking nearly 10 seconds to process each request. Even loading the initial login page takes about that long, then logging in takes that long, etc etc. The currently running instance specs are as follows: 615 MB RAM Intel Xenon CPU E5430 @ 2.66GHz 2.78 GHz 64-Bit Is the memory availability the issue? or is it the processing power? I forsee two options: Change to a larget instance Set up a 2-tier architecture with two micro instances Which of these will give the application better performance? Thanks in advance.

Read the article
Application Performance: The Best of the Web

- by Michaela Murray

Wisdom A deep understanding and realization […] resulting in the ability to apply perceptions, judgements and actions. It is also the comprehension of what is true coupled with optimum judgment as to action. - Wikipedia We’re writing a book for ASP.NET developers, and we want you to be a part of it. We know that there’s a huge amount of web developer wisdom that never gets shared, and we want to find those golden nuggets of knowledge and experience, and make sure everyone can learn from them. Right now, we want to find out about your top tips, hard-won lessons, and sage advice for avoiding, finding, and fixing application performance problems. If you work with .NET and SQL, even better – a lot of application performance relies on the interaction with the database, so we want to hear from you! “How Do You Want Me To Be Involved?” Right! Details! We want you, our most excellent readers, to email us with the Best Advice you would give to other developers for getting the best performance out of their applications. It doesn’t matter if your advice is for newbies or veterans, .NET or SQL – so long as it’s about application performance, we want to hear from you. (And if you think that there’s developer wisdom out there that “everyone knows”, a) I’m willing to bet you could find someone who doesn’t know about it, and b) it probably bears repeating anyway!) “I’m Interested. What Can You Do For Me?” Excellent question. For starters, there’s a chance to win a Microsoft Surface (the tablet, not the table-top). Once all the ASP.NET Wisdom has been collected, tallied, and labelled, it will then be weighed and measured by a team of expert judges (whose identities are still a closely-guarded secret). The top tip in both SQL & .NET categories will each win their author their very own MS Surface. But that’s not all! We can also give you… immortality! More details? Ok. We’ll be collecting all of the tips sent in by our readers (and we can’t wait to learn from you all,) and with the help of our Simple-Talk editors, we will publish and distribute your combined and documented knowledge as a free, community-created, professionally typeset eBook. You will naturally be credited by name / pseudonym / twitter handle / GitHub username / StackOverflow profile / Whatever, as the clearly ingenious author of hot performance tips. The Not-Very-Fine Print Here’s the breakdown: We want to bring together the best application performance knowledge from ASP.NET developers. Closing date for submissions will be 9am GMT, December 4th. Submissions should be made by email – [email protected] Submissions will be judged by a panel of expert judges (who will be revealed soon). The top submission in both the SQL & .NET categories will each win a Microsoft Surface. ALL the tips which make it through the judging process will be polished by Simple-Talk editors, and turned into a professionally typeset eBook, which will be freely available, and promoted alongside the ANTS Performance Profiler tool. Anyone whose entry makes it into the book will be clearly and profusely credited in the method of their choice (or can remain anonymous.) The really REALLY short version Share what you know about ASP.NET application performance for a chance to win a Microsoft Surface, and then get your name credited in a slick eBook with top-notch production values. For more details, see above. We can’t wait to learn from you!

Read the article
Set of Tools to optimize the performance in general of SQL Server

- by Dave

Hi, I know there are things out there to help to optimize queries, ect... but is there anything else, something like a full package that can scan your database and highlight all the performance issues, naming conventions, tables not properly normalized, etc? I know this is the job of a DBA and if the DBA is good, he shouldn't need a tool like that, but sometimes you start a new job, you get in charge of an existing database and the DB is a mess, so you don't know where to start... Thanks to everyone Dave

Read the article
Server Performance

- by sb12

I know very little about performance tuning of servers etc... so i thought i'd put this up here as i start some research on it, just to get some direction. I am in the process of migrating from my old server to a new one - both are 64 bit machines. One is a few years old, the other brand new (PowerEdge R410). The old server spec is: 2 cpus, 3.4GHz Pentiums, 8G of RAM, Fedora 11 currently installed The new server spec is: 16 cpus, 3.2 GHz Xeon, 16G of RAM, CentOS 6.2 installed. Also RAID10 is on the new server - no RAID on the old one. Both servers currently have the same database (MySQL) with the same data migrated. I wrote a Perl script that simply steps through each row of a table in the database (about 18000 rows) and updates a value in that row. Every row in the table is updated. Out of curiosity i ran this perl script on both machines, just to see how the new server would perform vs. the old one, and it produced interesting results: The old server was twice as fast as the new one to complete. Looking at the database, both are configured exactly the same (the new one being a dump of the old one...)... Anyone any ideas why this would be given the hardware gap between both? As i said i'm about to start some digging, but thought i'd put this up here to maybe get some good direction.... Many thanks in advance..

Read the article
Analysing and measuring the performance of a .NET application (survey results)

- by Laila

Back in December last year, I asked myself: could it be that .NET developers think that you need three days and a PhD to do performance profiling on their code? What if developers are shunning profilers because they perceive them as too complex to use? If so, then what method do they use to measure and analyse the performance of their .NET applications? Do they even care about performance? So, a few weeks ago, I decided to get a 1-minute survey up and running in the hopes that some good, hard data would clear the matter up once and for all. I posted the survey on Simple Talk and got help from a few people to promote it. The survey consisted of 3 simple questions: Amazingly, 533 developers took the time to respond - which means I had enough data to get representative results! So before I go any further, I would like to thank all of you who contributed, because I now have some pretty good answers to the troubling questions I was asking myself. To thank you properly, I thought I would share some of the results with you. First of all, application performance is indeed important to most of you. In fact, performance is an intrinsic part of the development cycle for a good 40% of you, which is much higher than I had anticipated, I have to admit. (I know, "Have a little faith Laila!") When asked what tool you use to measure and analyse application performance, I found that nearly half of the respondents use logging statements, a third use performance counters, and 70% of respondents use a profiler of some sort (a 3rd party performance profilers, the CLR profiler or the Visual Studio profiler). The importance attributed to logging statements did surprise me a little. I am still not sure why somebody would go to the trouble of manually instrumenting code in order to measure its performance, instead of just using a profiler. I personally find the process of annotating code, calculating times from log files, and relating it all back to your source terrifyingly laborious. Not to mention that you then need to remember to turn it all off later! Even when you have logging in place throughout all your code anyway, you still have a fair amount of potentially error-prone calculation to sift through the results; in addition, you'll only get method-level rather than line-level timings, and you won't get timings from any framework or library methods you don't have source for. To top it all, we all know that bottlenecks are rarely where you would expect them to be, so you could be wasting time looking for a performance problem in the wrong place. On the other hand, profilers do all the work for you: they automatically collect the CPU and wall-clock timings, and present the results from method timing all the way down to individual lines of code. Maybe I'm missing a trick. I would love to know about the types of scenarios where you actively prefer to use logging statements. Finally, while a third of the respondents didn't have a strong opinion about code performance profilers, those who had an opinion thought that they were mainly complex to use and time consuming. Three respondents in particular summarised this perfectly: "sometimes, they are rather complex to use, adding an additional time-sink to the process of trying to resolve the existing problem". "they are simple to use, but the results are hard to understand" "Complex to find the more advanced things, easy to find some low hanging fruit". These results confirmed my suspicions: Profilers are seen to be designed for more advanced users who can use them effectively and make sense of the results. I found yet more interesting information when I started comparing samples of "developers for whom performance is an important part of the dev cycle", with those "to whom performance is only looked at in times of crisis", and "developers to whom performance is not important, as long as the app works". See the three graphs below. Sample of developers to whom performance is an important part of the dev cycle: Sample of developers to whom performance is important only in times of crisis: Sample of developers to whom performance is not important, as long as the app works: As you can see, there is a strong correlation between the usage of a profiler and the importance attributed to performance: indeed, the more important performance is to a development team, the more likely they are to use a profiler. In addition, developers to whom performance is an important part of the dev cycle have a higher tendency to use a much wider range of methods for performance measurement and analysis. And, unsurprisingly, the less important performance is, the less varied the methods of measurement are. So all in all, to come back to my random questions: .NET developers do care about performance. Those who care the most use a wider range of performance measurement methods than those who care less. But overall, logging statements, performance counters and third party performance profilers are the performance measurement methods of choice for most developers. Finally, although most of you find code profilers complex to use, those of you who care the most about performance tend to use profilers more than those of you to whom performance is not so important.

Read the article
C# Performance Pitfall – Interop Scenarios Change the Rules

- by Reed

C# and .NET, overall, really do have fantastic performance in my opinion. That being said, the performance characteristics dramatically differ from native programming, and take some relearning if you’re used to doing performance optimization in most other languages, especially C, C++, and similar. However, there are times when revisiting tricks learned in native code play a critical role in performance optimization in C#. I recently ran across a nasty scenario that illustrated to me how dangerous following any fixed rules for optimization can be… The rules in C# when optimizing code are very different than C or C++. Often, they’re exactly backwards. For example, in C and C++, lifting a variable out of loops in order to avoid memory allocations often can have huge advantages. If some function within a call graph is allocating memory dynamically, and that gets called in a loop, it can dramatically slow down a routine. This can be a tricky bottleneck to track down, even with a profiler. Looking at the memory allocation graph is usually the key for spotting this routine, as it’s often “hidden” deep in call graph. For example, while optimizing some of my scientific routines, I ran into a situation where I had a loop similar to: for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i]); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This loop was at a fairly high level in the call graph, and often could take many hours to complete, depending on the input data. As such, any performance optimization we could achieve would be greatly appreciated by our users. After a fair bit of profiling, I noticed that a couple of function calls down the call graph (inside of ProcessElement), there was some code that effectively was doing: // Allocate some data required DataStructure* data = new DataStructure(num); // Call into a subroutine that passed around and manipulated this data highly CallSubroutine(data); // Read and use some values from here double values = data->Foo; // Cleanup delete data; // ... return bar; Normally, if “DataStructure” was a simple data type, I could just allocate it on the stack. However, it’s constructor, internally, allocated it’s own memory using new, so this wouldn’t eliminate the problem. In this case, however, I could change the call signatures to allow the pointer to the data structure to be passed into ProcessElement and through the call graph, allowing the inner routine to reuse the same “data” memory instead of allocating. At the highest level, my code effectively changed to something like: DataStructure* data = new DataStructure(numberToProcess); for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i], data); } delete data; Granted, this dramatically reduced the maintainability of the code, so it wasn’t something I wanted to do unless there was a significant benefit. In this case, after profiling the new version, I found that it increased the overall performance dramatically – my main test case went from 35 minutes runtime down to 21 minutes. This was such a significant improvement, I felt it was worth the reduction in maintainability. In C and C++, it’s generally a good idea (for performance) to: Reduce the number of memory allocations as much as possible, Use fewer, larger memory allocations instead of many smaller ones, and Allocate as high up the call stack as possible, and reuse memory I’ve seen many people try to make similar optimizations in C# code. For good or bad, this is typically not a good idea. The garbage collector in .NET completely changes the rules here. In C#, reallocating memory in a loop is not always a bad idea. In this scenario, for example, I may have been much better off leaving the original code alone. The reason for this is the garbage collector. The GC in .NET is incredibly effective, and leaving the allocation deep inside the call stack has some huge advantages. First and foremost, it tends to make the code more maintainable – passing around object references tends to couple the methods together more than necessary, and overall increase the complexity of the code. This is something that should be avoided unless there is a significant reason. Second, (unlike C and C++) memory allocation of a single object in C# is normally cheap and fast. Finally, and most critically, there is a large advantage to having short lived objects. If you lift a variable out of the loop and reuse the memory, its much more likely that object will get promoted to Gen1 (or worse, Gen2). This can cause expensive compaction operations to be required, and also lead to (at least temporary) memory fragmentation as well as more costly collections later. As such, I’ve found that it’s often (though not always) faster to leave memory allocations where you’d naturally place them – deep inside of the call graph, inside of the loops. This causes the objects to stay very short lived, which in turn increases the efficiency of the garbage collector, and can dramatically improve the overall performance of the routine as a whole. In C#, I tend to: Keep variable declarations in the tightest scope possible Declare and allocate objects at usage While this tends to cause some of the same goals (reducing unnecessary allocations, etc), the goal here is a bit different – it’s about keeping the objects rooted for as little time as possible in order to (attempt) to keep them completely in Gen0, or worst case, Gen1. It also has the huge advantage of keeping the code very maintainable – objects are used and “released” as soon as possible, which keeps the code very clean. It does, however, often have the side effect of causing more allocations to occur, but keeping the objects rooted for a much shorter time. Now – nowhere here am I suggesting that these rules are hard, fast rules that are always true. That being said, my time spent optimizing over the years encourages me to naturally write code that follows the above guidelines, then profile and adjust as necessary. In my current project, however, I ran across one of those nasty little pitfalls that’s something to keep in mind – interop changes the rules. In this case, I was dealing with an API that, internally, used some COM objects. In this case, these COM objects were leading to native allocations (most likely C++) occurring in a loop deep in my call graph. Even though I was writing nice, clean managed code, the normal managed code rules for performance no longer apply. After profiling to find the bottleneck in my code, I realized that my inner loop, a innocuous looking block of C# code, was effectively causing a set of native memory allocations in every iteration. This required going back to a “native programming” mindset for optimization. Lifting these variables and reusing them took a 1:10 routine down to 0:20 – again, a very worthwhile improvement. Overall, the lessons here are: Always profile if you suspect a performance problem – don’t assume any rule is correct, or any code is efficient just because it looks like it should be Remember to check memory allocations when profiling, not just CPU cycles Interop scenarios often cause managed code to act very differently than “normal” managed code. Native code can be hidden very cleverly inside of managed wrappers

Read the article
How to save a perfmon Performance Counter as a textfile (Reliability and Performance Monitor Version

- by Lieven Cardoen

Now the file gets saved as blg, but I would like a txt versin to import in Excel.

Read the article
How often is software speed evident in the eyes of customers?

- by rwong

In theory, customers should be able to feel the software performance improvements from first-hand experience. In practice, sometimes the improvements are not noticible enough, such that in order to monetize from the improvements, it is necessary to use quotable performance figures in marketing in order to attract customers. We already know the difference between perceived performance (GUI latency, etc) and server-side performance (machines, networks, infrastructure, etc). How often is it that programmers need to go the extra length to "write up" performance analyses for which the audience is not fellow programmers, but managers and customers?

Read the article
SQL SERVER – Server Side Paging in SQL Server 2011 Performance Comparison

- by pinaldave

Earlier, I have written about SQL SERVER – Server Side Paging in SQL Server 2011 – A Better Alternative. I got many emails asking for performance analysis of paging. Here is the quick analysis of it. The real challenge of paging is all the unnecessary IO reads from the database. Network traffic was one of the reasons why paging has become a very expensive operation. I have seen many legacy applications where a complete resultset is brought back to the application and paging has been done. As what you have read earlier, SQL Server 2011 offers a better alternative to an age-old solution. This article has been divided into two parts: Test 1: Performance Comparison of the Two Different Pages on SQL Server 2011 Method In this test, we will analyze the performance of the two different pages where one is at the beginning of the table and the other one is at its end. Test 2: Performance Comparison of the Two Different Pages Using CTE (Earlier Solution from SQL Server 2005/2008) and the New Method of SQL Server 2011 We will explore this in the next article. This article will tackle test 1 first. Test 1: Retrieving Page from two different locations of the table. Run the following T-SQL Script and compare the performance. SET STATISTICS IO ON; USE AdventureWorks2008R2 GO DECLARE @RowsPerPage INT = 10, @PageNumber INT = 5 SELECT * FROM Sales.SalesOrderDetail ORDER BY SalesOrderDetailID OFFSET @PageNumber*@RowsPerPage ROWS FETCH NEXT 10 ROWS ONLY GO USE AdventureWorks2008R2 GO DECLARE @RowsPerPage INT = 10, @PageNumber INT = 12100 SELECT * FROM Sales.SalesOrderDetail ORDER BY SalesOrderDetailID OFFSET @PageNumber*@RowsPerPage ROWS FETCH NEXT 10 ROWS ONLY GO You will notice that when we are reading the page from the beginning of the table, the database pages read are much lower than when the page is read from the end of the table. This is very interesting as when the the OFFSET changes, PAGE IO is increased or decreased. In the normal case of the search engine, people usually read it from the first few pages, which means that IO will be increased as we go further in the higher parts of navigation. I am really impressed because using the new method of SQL Server 2011, PAGE IO will be much lower when the first few pages are searched in the navigation. Test 2: Retrieving Page from two different locations of the table and comparing to earlier versions. In this test, we will compare the queries of the Test 1 with the earlier solution via Common Table Expression (CTE) which we utilized in SQL Server 2005 and SQL Server 2008. Test 2 A : Page early in the table -- Test with pages early in table USE AdventureWorks2008R2 GO DECLARE @RowsPerPage INT = 10, @PageNumber INT = 5 ;WITH CTE_SalesOrderDetail AS ( SELECT *, ROW_NUMBER() OVER( ORDER BY SalesOrderDetailID) AS RowNumber FROM Sales.SalesOrderDetail PC) SELECT * FROM CTE_SalesOrderDetail WHERE RowNumber >= @PageNumber*@RowsPerPage+1 AND RowNumber <= (@PageNumber+1)*@RowsPerPage ORDER BY SalesOrderDetailID GO SET STATISTICS IO ON; USE AdventureWorks2008R2 GO DECLARE @RowsPerPage INT = 10, @PageNumber INT = 5 SELECT * FROM Sales.SalesOrderDetail ORDER BY SalesOrderDetailID OFFSET @PageNumber*@RowsPerPage ROWS FETCH NEXT 10 ROWS ONLY GO Test 2 B : Page later in the table -- Test with pages later in table USE AdventureWorks2008R2 GO DECLARE @RowsPerPage INT = 10, @PageNumber INT = 12100 ;WITH CTE_SalesOrderDetail AS ( SELECT *, ROW_NUMBER() OVER( ORDER BY SalesOrderDetailID) AS RowNumber FROM Sales.SalesOrderDetail PC) SELECT * FROM CTE_SalesOrderDetail WHERE RowNumber >= @PageNumber*@RowsPerPage+1 AND RowNumber <= (@PageNumber+1)*@RowsPerPage ORDER BY SalesOrderDetailID GO SET STATISTICS IO ON; USE AdventureWorks2008R2 GO DECLARE @RowsPerPage INT = 10, @PageNumber INT = 12100 SELECT * FROM Sales.SalesOrderDetail ORDER BY SalesOrderDetailID OFFSET @PageNumber*@RowsPerPage ROWS FETCH NEXT 10 ROWS ONLY GO From the resultset, it is very clear that in the earlier case, the pages read in the solution are always much higher than the new technique introduced in SQL Server 2011 even if we don’t retrieve all the data to the screen. If you carefully look at both the comparisons, the PAGE IO is much lesser in the case of the new technique introduced in SQL Server 2011 when we read the page from the beginning of the table and when we read it from the end. I consider this as a big improvement as paging is one of the most used features for the most part of the application. The solution introduced in SQL Server 2011 is very elegant because it also improves the performance of the query and, at large, the database. Reference : Pinal Dave (http://blog.SQLAuthority.com) Filed under: SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Performance Comparison – INSERT TOP (N) INTO Table – Using Top with INSERT

- by pinaldave

Recently I wrote about SQL SERVER – INSERT TOP (N) INTO Table – Using Top with INSERT I mentioned about how TOP works with INSERT. I have mentioned that I will write about the performance in next article. Here is the performance comparison of the two options. It is clear that performance of the Insert TOP (N) [...]

Read the article
Free Online Performance Tuning Event

- by Andrew Kelly

On June 9th 2010 I will be showing several sessions related to performance tuning for SQL Server and they are the best kind because they are free :). So mark your calendars. Here is the event info and URL: June 29, 2010 - 10:00 am - 3:00 pm Eastern SQL Server is the platform for business. In this day-long free virtual event, well-known SQL Server performance expert Andrew Kelly will provide you with the tools and knowledge you need to stay on top of three key areas related to peak performance...(read more)

Read the article
SQL Server – Learning SQL Server Performance: Indexing Basics – Interview of Vinod Kumar by Pinal Dave

- by pinaldave

Recently I just wrote a blog post on about Learning SQL Server Performance: Indexing Basics and I received lots of request that if we can share some insight into the course. Every single time when Performance is discussed, Indexes are mentioned along with it. In recent times, data and application complexity is continuously growing. The demand for faster query response, performance, and scalability by organizations is increasing and developers and DBAs need to now write efficient code to achieve this. When we developed the course – we made sure that this course remains practical and demo heavy instead of just theories on this subject. Vinod Kumar and myself we often thought about this and realized that practical understanding of the indexes is very important. One can not master every single aspects of the index. However there are some minimum expertise one should gain if performance is one of the concern. Here is 200 seconds interview of Vinod Kumar I took right after completing the course. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Index, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology, Video

Read the article
Understanding Performance Profiling Targets

In this sample chapter from his upcoming book, Paul Glavich explains performance metrics and walks us through the steps needed to establish meaningful performance targets. He covers many metrics such as "time to first byte" and explains why you should add some contingency into your estimated performance requirements.

Read the article

Search Results

Search found 20275 results on 811 pages for 'general performance'.

Page 2/811 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Dave

- by Polynomial

- by SaveTheRbtz

- by pinaldave

- by fborozan

- by Yasei No Umi

- by Android5360

- by Karl Cassar

- by Pinal Dave

- by Ahmed Awan

- by gsc-frank

- by Brian G

- by Honus Wagner

- by Michaela Murray

- by Dave

- by sb12

- by Laila

- by Reed

- by Lieven Cardoen

- by rwong

- by pinaldave

- by pinaldave

- by Andrew Kelly

- by pinaldave

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >