Search Results

Search found 13461 results on 539 pages for 'optimizing performance'.

Page 12/539 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >

  • RAID10 without write-back cache = horrible write performance?

    - by Harry Mexican
    I have just provisioned a dedicated server on singlehop. I'm running it through some tests to know what to expect performance-wise. On the I/O side (with 4 1TB disks in RAID 10) I get: write-cache disabled 200 MB/s read throughput 30 MB/s write throughput I thought that was really low compared to my desktop HD which gets 150-150 or so. So I had a chat with them and they suggested enabling the write cache. New results: write-cache enabled 280 MB/s read 260 MB/s write which is great and all but means I'd have to add a BBU for an additional monthly cost. Is it normal for the write throughput to be 1/4 of a regular drive on RAID10, if you don't have write cache? It almost feels like its intentionally bad to force you to pony up for the BBU. I'd be happy with normal non-raid performance of 150/150.

    Read the article

  • Analyzing Linux NFS server performance

    - by Kamil Kisiel
    I'd like to do some analysis of our NFS server to help track down potential bottlenecks in our applications. The server is running SUSE Enterprise Linux 10. The kind of things I'm looking to know are: Which files are being accessed by which clients Read/write throughput on a per-client basis Overhead imposed by other RPC calls Time spent waiting on other NFS requests, or disk I/O, to service a client I already know about the statistics available in /proc/net/rpc/nfsd and in fact I wrote a blog post describing them in depth. What I'm looking for is a way to dig deeper and help understand what factors are contributing to the performance seen by a particular client. I want to analyze the role the NFS server plays in the performance of an application on our cluster so that I can think of ways to best optimize it.

    Read the article

  • performance block countries using iptables /netfilter

    - by markus
    It's easy to block IPs from country using iptables (e.g. like http://www.cyberciti.biz/faq/block-entier-country-using-iptables/). However I read that the performance can go down if the deny list get too large. An alternative is installing the iptables geoip patch or using ipset ( http://www.jsimmons.co.uk/2010/06/08/using-ipset-with-iptables-in-ubuntu-lts-1004-to-block-large-ip-ranges/) instead of iptables. Does anyone have experience with the various approaches and can say something about the performance differences ? Are there are other ways to block country IPs in linux which I did't mentioned above?

    Read the article

  • Determining Performance Limits

    - by JeffV
    I have a number of windows processes that pass messages between them hat a high rate using tcp to local host. Aside from testing on actual hardware how can I assess what my hardware limit will be. These applications are not doing CPU intensive work, mostly decomposing and combining messages, scanning over them for special flag in the data etc.. The message size is typically 3k and the rate is typically ~10k messages per second. ~30MB per second between processing stages. There may be 10 or more stages depending. For this type of application, what should I look to for assessing performance? What do I look for in a server performance wise? I am currently running an XEON L5408 with 32 GB ram. But I am assuming cache is more important than actual ram size as I am barely touching the ram.

    Read the article

  • Does shutdown idle VMs improve the performance?

    - by Samselvaprabu
    Often our team members are coming to me with a compliant that their VMs are slow. Our team members suggested to shutdown some of the VMs temporarily and try to access the VM. But most cases that would not help. Assume that i have assigned 4 GB for and 2 CPUs for my VM. So ideally it should not face performance issue. As our ESXi 4.1 server has multiple VM in the same server (we have overcommited memory and CPU). Does shut down other VM really helps to improve performance or not? [Note : We are using ESXi 4.1 and our hardware is R710 server. We have more number of VMs in single server so we have overcommited memory.]

    Read the article

  • Strange performance from RAID5 using WD RE4 disks

    - by Howard
    I've noticed a bit of a performance issue with some WD RE4 drives I'm using under AMD's hardware RAID solution. First a bit of background: Environment: Windows 7 home premium x64 HDD's: 3x 1TB WD Raid Edition 4 in a RAID 5 setup with 128 kbyte stripe (2TB usable space) Testing Tool: HD Tune, process set to "High Priority" Processor: AMD Phenom II x6 1100T Ram: 16GB DDR3/1600mhz Motherboard: MSI 970A-G45 The image below pretty much depicts the issue I'm having. Every test has the same thing, a period of similar length where the performance drops to a few megabytes a second. This can't be a TLER issue as the purpose of RE4's is to work around that. Any help would be greatly appreciated.

    Read the article

  • OBIEE 11.1.1 - (Updated) Best Practices Guide for Tuning Oracle® Business Intelligence Enterprise Edition (Whitepaper)

    - by Ahmed Awan
    Applies To: This whitepaper applies to OBIEE release 11.1.1.3, 11.1.1.5 and 11.1.1.6 Introduction: One of the most challenging aspects of performance tuning is knowing where to begin. To maximize Oracle® Business Intelligence Enterprise Edition performance, you need to monitor, analyze, and tune all the Fusion Middleware / BI components. This guide describes the tools that you can use to monitor performance and the techniques for optimizing the performance of Oracle® Business Intelligence Enterprise Edition components. Click to Download the OBIEE Infrastructure Tuning Whitepaper (Right click or option-click the link and choose "Save As..." to download this file) Disclaimer: All tuning information stated in this guide is only for orientation, every modification has to be tested and its impact should be monitored and analyzed. Before implementing any of the tuning settings, it is recommended to carry out end to end performance testing that will also include to obtain baseline performance data for the default configurations, make incremental changes to the tuning settings and then collect performance data. Otherwise it may worse the system performance.

    Read the article

  • Optimizing Solaris 11 SHA-1 on Intel Processors

    - by danx
    SHA-1 is a "hash" or "digest" operation that produces a 160 bit (20 byte) checksum value on arbitrary data, such as a file. It is intended to uniquely identify text and to verify it hasn't been modified. Max Locktyukhin and others at Intel have improved the performance of the SHA-1 digest algorithm using multiple techniques. This code has been incorporated into Solaris 11 and is available in the Solaris Crypto Framework via the libmd(3LIB), the industry-standard libpkcs11(3LIB) library, and Solaris kernel module sha1. The optimized code is used automatically on systems with a x86 CPU supporting SSSE3 (Intel Supplemental SSSE3). Intel microprocessor architectures that support SSSE3 include Nehalem, Westmere, Sandy Bridge microprocessor families. Further optimizations are available for microprocessors that support AVX (such as Sandy Bridge). Although SHA-1 is considered obsolete because of weaknesses found in the SHA-1 algorithm—NIST recommends using at least SHA-256, SHA-1 is still widely used and will be with us for awhile more. Collisions (the same SHA-1 result for two different inputs) can be found with moderate effort. SHA-1 is used heavily though in SSL/TLS, for example. And SHA-1 is stronger than the older MD5 digest algorithm, another digest option defined in SSL/TLS. Optimizations Review SHA-1 operates by reading an arbitrary amount of data. The data is read in 512 bit (64 byte) blocks (the last block is padded in a specific way to ensure it's a full 64 bytes). Each 64 byte block has 80 "rounds" of calculations (consisting of a mixture of "ROTATE-LEFT", "AND", and "XOR") applied to the block. Each round produces a 32-bit intermediate result, called W[i]. Here's what each round operates: The first 16 rounds, rounds 0 to 15, read the 512 bit block 32 bits at-a-time. These 32 bits is used as input to the round. The remaining rounds, rounds 16 to 79, use the results from the previous rounds as input. Specifically for round i it XORs the results of rounds i-3, i-8, i-14, and i-16 and rotates the result left 1 bit. The remaining calculations for the round is a series of AND, XOR, and ROTATE-LEFT operators on the 32-bit input and some constants. The 32-bit result is saved as W[i] for round i. The 32-bit result of the final round, W[79], is the SHA-1 checksum. Optimization: Vectorization The first 16 rounds can be vectorized (computed in parallel) because they don't depend on the output of a previous round. As for the remaining rounds, because of step 2 above, computing round i depends on the results of round i-3, W[i-3], one can vectorize 3 rounds at-a-time. Max Locktyukhin found through simple factoring, explained in detail in his article referenced below, that the dependencies of round i on the results of rounds i-3, i-8, i-14, and i-16 can be replaced instead with dependencies on the results of rounds i-6, i-16, i-28, and i-32. That is, instead of initializing intermediate result W[i] with: W[i] = (W[i-3] XOR W[i-8] XOR W[i-14] XOR W[i-16]) ROTATE-LEFT 1 Initialize W[i] as follows: W[i] = (W[i-6] XOR W[i-16] XOR W[i-28] XOR W[i-32]) ROTATE-LEFT 2 That means that 6 rounds could be vectorized at once, with no additional calculations, instead of just 3! This optimization is independent of Intel or any other microprocessor architecture, although the microprocessor has to support vectorization to use it, and exploits one of the weaknesses of SHA-1. Optimization: SSSE3 Intel SSSE3 makes use of 16 %xmm registers, each 128 bits wide. The 4 32-bit inputs to a round, W[i-6], W[i-16], W[i-28], W[i-32], all fit in one %xmm register. The following code snippet, from Max Locktyukhin's article, converted to ATT assembly syntax, computes 4 rounds in parallel with just a dozen or so SSSE3 instructions: movdqa W_minus_04, W_TMP pxor W_minus_28, W // W equals W[i-32:i-29] before XOR // W = W[i-32:i-29] ^ W[i-28:i-25] palignr $8, W_minus_08, W_TMP // W_TMP = W[i-6:i-3], combined from // W[i-4:i-1] and W[i-8:i-5] vectors pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) movdqa W, W_TMP // 4 dwords in W are rotated left by 2 psrld $30, W // rotate left by 2 W = (W >> 30) | (W << 2) pslld $2, W_TMP por W, W_TMP movdqa W_TMP, W // four new W values W[i:i+3] are now calculated paddd (K_XMM), W_TMP // adding 4 current round's values of K movdqa W_TMP, (WK(i)) // storing for downstream GPR instructions to read A window of the 32 previous results, W[i-1] to W[i-32] is saved in memory on the stack. This is best illustrated with a chart. Without vectorization, computing the rounds is like this (each "R" represents 1 round of SHA-1 computation): RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR With vectorization, 4 rounds can be computed in parallel: RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR RRRRRRRRRRRRRRRRRRRR Optimization: AVX The new "Sandy Bridge" microprocessor architecture, which supports AVX, allows another interesting optimization. SSSE3 instructions have two operands, a input and an output. AVX allows three operands, two inputs and an output. In many cases two SSSE3 instructions can be combined into one AVX instruction. The difference is best illustrated with an example. Consider these two instructions from the snippet above: pxor W_minus_16, W // W = (W[i-32:i-29] ^ W[i-28:i-25]) ^ W[i-16:i-13] pxor W_TMP, W // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) With AVX they can be combined in one instruction: vpxor W_minus_16, W, W_TMP // W = (W[i-32:i-29] ^ W[i-28:i-25] ^ W[i-16:i-13]) ^ W[i-6:i-3]) This optimization is also in Solaris, although Sandy Bridge-based systems aren't widely available yet. As an exercise for the reader, AVX also has 256-bit media registers, %ymm0 - %ymm15 (a superset of 128-bit %xmm0 - %xmm15). Can %ymm registers be used to parallelize the code even more? Optimization: Solaris-specific In addition to using the Intel code described above, I performed other minor optimizations to the Solaris SHA-1 code: Increased the digest(1) and mac(1) command's buffer size from 4K to 64K, as previously done for decrypt(1) and encrypt(1). This size is well suited for ZFS file systems, but helps for other file systems as well. Optimized encode functions, which byte swap the input and output data, to copy/byte-swap 4 or 8 bytes at-a-time instead of 1 byte-at-a-time. Enhanced the Solaris mdb(1) and kmdb(1) debuggers to display all 16 %xmm and %ymm registers (mdb "$x" command). Previously they only displayed the first 8 that are available in 32-bit mode. Can't optimize if you can't debug :-). Changed the SHA-1 code to allow processing in "chunks" greater than 2 Gigabytes (64-bits) Performance I measured performance on a Sun Ultra 27 (which has a Nehalem-class Xeon 5500 Intel W3570 microprocessor @3.2GHz). Turbo mode is disabled for consistent performance measurement. Graphs are better than words and numbers, so here they are: The first graph shows the Solaris digest(1) command before and after the optimizations discussed here, contained in libmd(3LIB). I ran the digest command on a half GByte file in swapfs (/tmp) and execution time decreased from 1.35 seconds to 0.98 seconds. The second graph shows the the results of an internal microbenchmark that uses the Solaris libpkcs11(3LIB) library. The operations are on a 128 byte buffer with 10,000 iterations. The results show operations increased from 320,000 to 416,000 operations per second. Finally the third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. third graph shows the results of an internal kernel microbenchmark that uses the Solaris /kernel/crypto/amd64/sha1 module. The operations are on a 64Kbyte buffer with 100 iterations. The results show for 1 kernel thread, operations increased from 410 to 600 MBytes/second. For 8 kernel threads, operations increase from 1540 to 1940 MBytes/second. Availability This code is in Solaris 11 FCS. It is available in the 64-bit libmd(3LIB) library for 64-bit programs and is in the Solaris kernel. You must be running hardware that supports Intel's SSSE3 instructions (for example, Intel Nehalem, Westmere, or Sandy Bridge microprocessor architectures). The easiest way to determine if SSSE3 is available is with the isainfo(1) command. For example, nehalem $ isainfo -v $ isainfo -v 64-bit amd64 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu If the output also shows "avx", the Solaris executes the even-more optimized 3-operand AVX instructions for SHA-1 mentioned above: sandybridge $ isainfo -v 64-bit amd64 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications avx xsave pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this code. Solaris libraries and kernel automatically determine if it's running on SSSE3 or AVX-capable machines and execute the correctly-tuned code for that microprocessor. Summary The Solaris 11 Crypto Framework, via the sha1 kernel module and libmd(3LIB) and libpkcs11(3LIB) libraries, incorporated a useful SHA-1 optimization from Intel for SSSE3-capable microprocessors. As with other Solaris optimizations, they come automatically "under the hood" with the current Solaris release. References "Improving the Performance of the Secure Hash Algorithm (SHA-1)" by Max Locktyukhin (Intel, March 2010). The source for these SHA-1 optimizations used in Solaris "SHA-1", Wikipedia Good overview of SHA-1 FIPS 180-1 SHA-1 standard (FIPS, 1995) NIST Comments on Cryptanalytic Attacks on SHA-1 (2005, revised 2006)

    Read the article

  • Why does Chrome video performance substantially degrade after waking from suspend in 10.10?

    - by Grant Heaslip
    Note: For some more details, some of which may not be true given what I've figured out, see this post. When I first boot my computer, video performance (both native H.264 HTML5 in YouTube and Vimeo, and in Flash) in Chrome is perfectly reasonable. CPU usage stays slow, everything works correctly, and the video is silky-smooth. But for whatever reason, if I suspend my computer then wake it up, video performance plummets. Full screen HTML5 video is choppy at best, and full-screen Flash video basically brings my computer to its knees (I'm talking less than a frame a second, and a 5 second lead time to leave full-screen after hitting Esc). Restarting Chrome doesn't fix this — I need to completely restart my machine before performance goes back to normal. Video performance in other applications, such as Movie Player, doesn't seem to be affected at all by the suspend cycle — it's only Chrome. I'm using a Lenovo X201, with an Intel GMA HD graphics chipset, and Intel compnents all around (I don't need any proprietary drivers). This didn't happen in 10.04, and I haven't anything that I think would have caused this to happen. It's possible that a Chrome release could have caused this, but it seems less likely than a regression between 10.04 and 10.10. Any ideas? EDIT: In response Georg's comment, logging in and out doesn't fix it. Restarting Compiz or switching to Metacity (at least by using "compiz/metacity --replace & disown" — am I doing it right?) doesn't help (actually, it seemed to help somewhat with Flash once, but I haven't been able to reproduce this). I'm not sure about GDM — when I use "sudo restart gdm" I get kicked back to the Linux shell (?), which I have no idea how to get out of. Also, I want to make very clear that this isn't just a case of Flash sucking (it does,but that's beside the point). I"m seeing the same general problem with HTML5 videos, and Flash is performing better on my Nexus One than it does on my Core i5 laptop. There's something screwy going on with Chrome and/or 10.10.

    Read the article

  • Getting Optimal Performance from Oracle E-Business Suite

    - by Steven Chan (Oracle Development)
    Performance tuning and optimization in E-Business Suite environments can involve many different components and diagnostic tools.  Samer Barakat, Senior Architect in our Applications Performance group, held an OpenWorld 2013 session that covered: Performance triage, analysis and diagnostic tools Optimizing the E-Business Suite application tier, including Concurrent Manager Optimizing the E-Business Suite database tier Optimizing the E-Business Suite on Real Application Clusters (RAC) E-Business Suite on engineered systems, including Exadata and Exalogic Optimizing E-Business Suite data management, including archiving and purging  The Applications Performance group works with the world's largest E-Business Suite customers to isolate and resolve performance bottlenecks. This team has helped tune the E-Business Suite environments of world's largest companies to handle staggering amounts of transactional volume in multi-terabyte databases.  This group also publishes our official Oracle Apps benchmarks, white papers, and performance metrics. This is an essential set of tips and techniques that all EBS sysadmins and DBAs can use to improve the performance of their environments: Getting Optimal Performance from Oracle E-Business Suite (PDF, 1.7 MB) OpenWorld 2013 presentations are only available for approximately six months -- until ~March 2013.  Download this one while it's still available. Related Articles E-Business Suite Technology Sessions at OpenWorld 2013 OAUG/Collaborate Recap: Best Practices for E-Business Suite Performance Tuning

    Read the article

  • Bad disk performance on HP DL360 with Smarty Array P400i RAID controller

    - by sarge
    I have a HP DL360 server with 4x 146GB SAS disks and a Smart Array P400i RAID controller with 256MB cache. The disks are in RAID 5 (3 disks + 1 hot spare). The server is running VMware ESX 3i. The disk write performance is really bad. Here are some numbers: ns1:~# hdparm -tT /dev/sda /dev/sda: Timing cached reads: 3364 MB in 2.00 seconds = 1685.69 MB/sec Timing buffered disk reads: 18 MB in 3.79 seconds = 4.75 MB/sec ns1:~# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=125000 && sync" 125000+0 records in 125000+0 records out 1024000000 bytes (1.0 GB) copied, 282.307 s, 3.6 MB/s real 4m52.003s user 0m2.160s sys 3m10.796s Compared to another server those number are terrible: Dell R200, 2x 500GB SATA disks, PERC raid controller (disks are mirrored). web4:~# hdparm -tT /dev/sda /dev/sda: Timing cached reads: 6584 MB in 2.00 seconds = 3297.79 MB/sec Timing buffered disk reads: 316 MB in 3.02 seconds = 104.79 MB/sec web4:~# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=125000 && sync" 125000+0 records in 125000+0 records out 1024000000 bytes (1.0 GB) copied, 35.2919 s, 29.0 MB/s real 0m36.570s user 0m0.476s sys 0m32.298s The server isn't very loaded and the VMware Infrastructure Client performance monitor is showing 550KBps average read and 1208KBps average write for the last 30 minutes (highest write rate: 6.6MBps). This has been a problem from the start. Any ideas?

    Read the article

  • Performance variation

    - by Ree
    During my time spent working with multiple machines, I have noticed that performance of the same machine doing the same tasks in the same order differs and sometimes the difference is big enough to be noticeable. This applies to all the machines I've owned and/or maintained (old and modern). Some examples (many of them you may have noticed yourself) that sometimes are completed in different time frames: POST OS installation Hardware tests and operations (usually executed within a customized OS such as one of the many DOS variants), HDD tests and "low level" formats Software installation or other tasks (such as benchmarks) within a general purpose OS (Windows, Linux, etc) I can imagine this is caused by the fact that a machine is built with many components having to communicate as a whole and since the mechanical and electronic parts aren't perfect the overhead occurs. In the last example, I assume the OS complexity and concurrently running multiple processes has some additional effect as well. However, I'm wondering if this hardware imperfection and overhead is indeed that high to be humanly noticeable? Maybe there are other factors that are influencial as much or even more? So, in short - why? To emphasize: the difference is noticeable on the same machine performing the same tasks and this applies to ANY machine in my experience. I'm not comparing machine to machine performance.

    Read the article

  • Improving Performance of RDP Over LAN

    - by Jared Brown
    Architecture: A deployment of 6 new HP thin clients (Windows XP Embedded) with TCP/IP access to several new HP servers (Windows 2003 Server). Each thin client is connected over fiber optic to a Gigabit Cisco switch, which the servers are connected to. There are 10/100 Ethernet to fiber converter boxes on each end of the fiber cables. Problem: Noticeable lag over RDP while using the Unigraphics CAD package. 3D models take .5 to 1 second to respond to mouse actions. Other Details: Network throughput on each thin client's RDP session is 7288 kbps. RDP connection settings - color setting: 15k, all themes, etc. turned off. Local and remote system performance stats are well within norms (CPU, Memory, and Network). Question: Are there newer versions of terminal services or RDP I can use on my existing OSes? Are there compression algorithms, etc. that are well suited for a high-bandwidth LAN? Are there valid alternatives that will yield higher performance (i.e. UltraVNC with drivers installed)? Are there TCP/IP tuning options I can exploit?

    Read the article

  • Low CPU performance with low usage and clock - Windows 8.1

    - by Daniele
    I recently deleted everything from my PC and reinstalled Windows 8.1 from scratch. When I first booted into Windows everything was extremely slow though the CPU usage was very low (about 1%). After installing some drivers the problem seemed to be solved, I was able to use my PC normally. Today I installed a game and I noticed a strange behavior: the game was playable but the performance worsened more and more in the time. This is the situation BEFORE opening the game (normal): This is AFTER some minutes inside the game (low CPU usage and clock): Some information about my system: PC: Sony Vaio S13 (SVS13A1C5E) OS: Windows 8.1 CPU: Intel Core i7-3520M 2.90GHz GPU(1): Intel HD Graphics 4000 GPU(2): NVIDIA GeForce GT 640M LE I tried searching for new drivers and other solutions but noting worked and I don't know what is the cause. I did not checked the temperatures but the fans are not running fast and the PC does not look overheated. Update: Max CPU Temp: 66°C, Max GPU Temp: 61°C The strange thing is that the GPU load is 99% (GPU-Z) and the fan is almost silent. Update 2: I had troubles with Sony Vaio software, I can't get the FN keys and the STAMINA/SPEED switch to work (it is a physical switch to enable/disable the Nvidia card and change the Power Profile). I'm saying this because I remember that before reinstalling Windows there was an option in the Vaio Control Center (now it is not there anymore) that allowed me to choose from something like "priority to performance (ventilation)" or "priority to silence". The current behavior looks like a "priority to silence", but I can't get the stamina-speed switch to work and so I don't see similar oprions in the Vaio Control Center. I don't know if the problem is related to this.

    Read the article

  • Linux Kernel Packet Forwarding Performance

    - by Bob Somers
    I've been using a Linux box as a router for some time now. Nothing too fancy, just enabling forwarding in the kernel, turning on masquerading, and setting up iptables to poke a few holes in the firewall. Recently a friend of mine pointed out a performance problem. Single TCP connections seem to experience very poor performance. You have to open multiple parallel TCP connections to get decent speed. For example, I have a 10 Mbit internet connection. When I download a file from a known-fast source using something like the DownThemAll! extension for Firefox (which opens multiple parallel TCP connections) I can get it to max out my downstream bandwidth at around 1 MB/s. However, when I download the same file using the built-in download manager in Firefox (uses only a single TCP connection) it starts fast and the speed tanks until it tops out around 100 KB/s to 350 KB/s. I've checked the internal network and it doesn't seem to have any problems. Everything goes through a 100 Mbit switch. I've also run iperf both internally (from the router to my desktop) and externally (from my desktop to a Linux box I own out on the net) and haven't seen any problems. It tops out around 1 MB/s like it should. Speedtest.net also reports 10 Mbits speeds. The load on the Linux machine is around 0.00, 0.00, 0.00 all the time, and it's got plenty of free RAM. It's an older laptop with a Pentium M 1.6 GHz processor and 1 GB of RAM. The internal network is connected to the built in Intel NIC and the cable modem is connected to a Netgear FA511 32-bit PCMCIA network card. I think the problem is with the packet forwarding in the router, but I honestly am not sure where the problem could be. Is there anything that would substantially slow down a single TCP stream?

    Read the article

  • PHP/MySQL Performance Testing with Just PHP

    - by Mike Gifford
    I'm trying to diagnose a server where the website is loading very slowly, but unfortunately my client has only provided me with FTP access. I've got FTP access so I can upload PHP scripts, but can't set up any other server side tools. I have access to phpMyAdmin, but not direct access to the MySQL server. It is also unfortunately a Windows server (and we've been a Linux shop for over a decade now). So, if I wan to evaluate MySQL & disk speed performance through PHP on a generic server, what is the best way to do this? There are already tools like: https://github.com/raphaelm/php-benchmark or https://github.com/InfinitySoft/php-benchmark But I'm surprised there isn't something that someone has already set up & configured to just run through and do some basic testing of a server's responsiveness. Every time we evaluate a new server environment it's handy to be able to compare it to an existing one quickly to see if there are any anomalies. I guess I'd just hoped that someone else had written up a script to do this already. I know I have, but that was before Github when there was a handy place to post scraps of code like this. Originally posted in http://stackoverflow.com/questions/12321498/php-mysql-performance-testing-with-just-php but it was recommended that I re-post it here.

    Read the article

  • IIS High use & Server Performance issues

    - by HaydnWVN
    Have an SBS2011 running Exchange, a database app and a few other things serving 5 users (3 low use, 1 high). The server was never specced for the database app so it isn't as powerful as I'd like... Only 12GB RAM. We have increasingly found performance problems with this server, last week it was so bad I couldn't even connect remotely. To free up some available RAM I have (over the past month or so): Restricted the Exchange Message Store to 1GB with (so far) no ill effects. Restricted SQL Databases (including SBSMonitoring and Sharepoint/##SSEE (Which isn't used)). Now I am finding that IIS Worker threads are using up the available memory and I have (so far) been unable to track down much useful information about restricting them. This server is not 'serving' anything web-based apart from OWA that I am finding people using because Outlook is so slow (again related to the Servers performance). I am aware that Exchange on SBS2011 is designed to use up available resources (and concede when other applications request). But it is not doing so (or anywhere near fast enough) for our needs. Opening the database application (using Postgres) takes 5+ minutes from client machines and regularly times out or crashes due to this. After a reboot (before SQL/Exchange/IIS databases are very large/totally cached) we get the performace we need and expect. Previously a reboot once a month was enough... Then once a week... Now they have taken to rebooting it almost daily!

    Read the article

  • Very slow write performance on Debian 6.0 (AMD64) with DMCRYPT/LVM/RAID1

    - by jdelic
    I'm seeing very strange performance characteristics on one of my servers. This server is running a simple two-disk software-RAID1 setup with LVM spanning /dev/md0. One of the logical volumes /dev/vg0/secure is encrypted using dmcrypt with LUKS and mounted with the sync and noatimes flag. Writing to that volume is incredibly slow at 1.8 MB/s and the CPU usage stays near 0%. There are 8 crpyto/1-8 processes running (it's a Intel Quadcore CPU). I hope that someone on serverfault has seen this before :-(. uname -a 2.6.32-5-xen-amd64 #1 SMP Tue Mar 8 00:01:30 UTC 2011 x86_64 GNU/Linux Interestingly, when I read from the device I get good performance numbers: reading without encryption: $ dd if=/dev/vg0/secure of=/dev/null bs=64k count=100000 100000+0 records in 100000+0 records out 6553600000 bytes (6.6 GB) copied, 68.8951 s, 95.1 MB/s reading with encryption: $ dd if=/dev/mapper/secure of=/dev/null bs=64k count=100000 100000+0 records in 100000+0 records out 6553600000 bytes (6.6 GB) copied, 69.7116 s, 94.0 MB/s However, when I try to write to the device: $ dd if=/dev/zero of=./test bs=64k 8809+0 records in 8809+0 records out 577306624 bytes (577 MB) copied, 321.861 s, 1.8 MB/s Also, when I read I see CPU usage, when I write, the CPU stays at almost 0% usage. Here is output of cryptsetup luksDump: LUKS header information for /dev/vg0/secure Version: 1 Cipher name: aes Cipher mode: cbc-essiv:sha256 Hash spec: sha1 Payload offset: 2056 MK bits: 256 MK digest: dd 62 b9 a5 bf 6c ec 23 36 22 92 4c 39 f8 d6 5d c1 3a b7 37 MK salt: cc 2e b3 d9 fb e3 86 a1 bb ab eb 9d 65 df b3 dd d9 6b f4 49 de 8f 85 7d 3b 1c 90 83 5d b2 87 e2 MK iterations: 44500 UUID: a7c9af61-d9f0-4d3f-b422-dddf16250c33 Key Slot 0: ENABLED Iterations: 178282 Salt: 60 24 cb be 5c 51 9f b4 85 64 3d f8 07 22 54 d4 1a 5f 4c bc 4b 82 76 48 d8 a2 d2 6a ee 13 d7 5d Key material offset: 8 AF stripes: 4000 Key Slot 1: DISABLED Key Slot 2: DISABLED Key Slot 3: DISABLED Key Slot 4: DISABLED Key Slot 5: DISABLED Key Slot 6: DISABLED Key Slot 7: DISABLED

    Read the article

  • How to configure VirtualBox server for performance at home

    - by BluJai
    I currently have two physical Ubuntu Server 10.10 servers at home: one serves as our firewall/router/DHCP/VPN server and the other performs double-duty as a file server and a VirtualBox host for an Ubuntu Desktop 10.10 machine which I use from remote connections (via NoMachine) for many thin-client purposes which are irrelevant to my question. What I'd like to accomplish is to consolidate the two physical machines into one which is a dedicated VirtualBox host (most likely running Ubuntu Server 10.10). Note that I'd like to stick with VirtualBox (if possible) because I'm most comfortable with it and use it on a daily basis at both home and work. Specifically, I plan to have one VM set up as file server, another as the firewall/router/DHCP/VPN (or possibly split those a bit) and a third, which is the only current VM (already VirtualBox), which is the thin-client host. My question comes down to performance and/or recommendations about the file server VM. The file server hosts about 6 terabytes of data across 4 drives. What I'd like to do is use raw disk access from the VM directly to the existing disks. However, I'm curious what performance advantage/disadvantage that would have as compared to using shared folders from the VM host and basically just have the whole drive served as a shared folder to the VM which would then serve it to the other machines on the network. I don't know if virtual disks would even work in this scenario and I certainly wouldn't want a drive to be filled with just a single file which is 1.5 TB (disk image). To add understanding of context, but not to get additional advice, I want to virtualize these machines because I intend to regularly use the snapshot capabilities of VirtualBox for the system disks (which will be virtual drives) of the VMs and I have some physical space/power needs to address (as I mentioned, this is at home).

    Read the article

  • Slow performance on VMWare Linux server after Tomcat install

    - by Loftx
    We have a VMWare ESXi 4.1 server hosting a number of Linux and Windows guests. Recently a new Linux guest was added to this server and seemed to be performing well. Tomcat and some other applications on this server were then installed which seem to have caused the server to run really slowly without any obvious resource issues. Slow performance include: The time taken to bring up the password prompt over ssh takes a few seconds when it was previously instantaneous. The time taken to unzip a zip file which was previously a few seconds now takes around 30 seconds The time taken to compile vmware tools has increased by similar factors Both the VMWare console and monitoring commands don't report any issues with high CPU or memory usage but something is obviously slowing the server down somehow. Does anyone have any ideas what may be causing this issue and how it can be resolved? Thanks, Tom Edit As per your questions I’ve looked at some of the performance indicators on both the VM host and VM guest indicated. Firstly I tried reserving the full amount of memory (3gb) for this VM – no other machines on this server have any memory reservation. The swap in rate and swap out rate for the VM host and guest are now both zero. Balloon memory on the guest is zero and on the host is 3.5gb (total memory on the host is 12gb) The swap rate for the guest is also zero. Swap used by the host is 200mb on average. Compression and decompression rates for the host and guest are zero. Command aborts for the host are zero. Read latency is very low – maximum 10ms average 0.8ms. Write latency is higher – a few spikes to 170ms but mostly around 25ms – is this bad? Queue command latency is zero . Physical disk read latency averages 5ms but often 10ms Physical disk write latency averages 15ms but is often 20ms I hope this helps - let me know if you need any more information.

    Read the article

  • Windows 7 network performance tuning for LAN

    - by Hubert Kario
    I want to tune Windows 7 TCP stack for speed in a LAN environment. Bit of background info: I've got a Citrix XenServer set up with Windows 2008R2, Windows 7 and Debian Lenny with Citrix kernel, Windows machines have Tools installed the iperf server process is running on different host, also Debian Lenny. The servers are otherwise idle, tests were repeated few times to confirm results. While testing with iperf 2008R2 can achieve around 600-700Mbps with no tuning what so ever but I can't find any guide or set of parameters that will make Windows 7 achieve anything over 150Mbps with no change in TCP window size using -w parameter to iperf. I tried using netsh autotuining to disabled, experimental, normal and highlyrestricted - no change. Changing congestionprovider doesn't do anything, just as rss and chimney. Setting all the available settings to same values as on Windows 2008R2 host doesn't help. To summarize: Windows 2008R2 default settings: 600-700Mbps Debian, default settings: 600Mbps Windows 7 default settings: 120Mbps Windows 7 default, iperf -w 65536: 400-500Mbps While the missing 400Mbps in performance I blame on crappy Realtek NIC in the XenServer host (I can do ~980Mbps from my laptop to the iperf server) it doesn't explain why Windows 7 can't achieve good performance without manually tuning window size at the application level. So, how to tune Windows 7?

    Read the article

  • How to configure VirtualBox server for performance at home

    - by BluJai
    I currently have two physical Ubuntu Server 10.10 servers at home: one serves as our firewall/router/DHCP/VPN server and the other performs double-duty as a file server and a VirtualBox host for an Ubuntu Desktop 10.10 machine which I use from remote connections (via NoMachine) for many thin-client purposes which are irrelevant to my question. What I'd like to accomplish is to consolidate the two physical machines into one which is a dedicated VirtualBox host (most likely running Ubuntu Server 10.10). Note that I'd like to stick with VirtualBox (if possible) because I'm most comfortable with it and use it on a daily basis at both home and work. Specifically, I plan to have one VM set up as file server, another as the firewall/router/DHCP/VPN (or possibly split those a bit) and a third, which is the only current VM (already VirtualBox), which is the thin-client host. My question comes down to performance and/or recommendations about the file server VM. The file server hosts about 6 terabytes of data across 4 drives. What I'd like to do is use raw disk access from the VM directly to the existing disks. However, I'm curious what performance advantage/disadvantage that would have as compared to using shared folders from the VM host and basically just have the whole drive served as a shared folder to the VM which would then serve it to the other machines on the network. I don't know if virtual disks would even work in this scenario and I certainly wouldn't want a drive to be filled with just a single file which is 1.5 TB (disk image). To add understanding of context, but not to get additional advice, I want to virtualize these machines because I intend to regularly use the snapshot capabilities of VirtualBox for the system disks (which will be virtual drives) of the VMs and I have some physical space/power needs to address (as I mentioned, this is at home).

    Read the article

  • graphics performance better on battery?

    - by Scott Beeson
    Anyone have any idea why my laptop would perform (considerably) better while on battery than while plugged in? It's a Dell Latitude E6420 with Windows 8 Pro. I tried mirroring all the settings in the selected power plan from "On battery" to "Plugged In" and that didn't help. I then just restored the defaults for all power plans (balanced and high performance). I'm still seeing the same results. The best example where it is most noticeable (don't laugh) is Sim City Social in Chrome. I'm probably seeing a performance increase of 5x on battery versus plugged in. This is easily reproducible too. I'm very confused. Could it be caused by dust? The laptop isn't that old and there is no visible dust. I'm not going to take it apart to check the insides as it's a corporate laptop. Could it be overheating? Battery Sim City Social: 68 degrees max Civ V: 77 degrees max Charger Sim City Social: 68 Civ V: did not test See answer below... I'm retarded

    Read the article

  • Difference in performance: local machine VS amazon medium instance

    - by user644745
    I see a drastic difference in performance matrix when i run it with apache benchmark (ab) in my local machine VS production hosted in amazon medium instance. Same concurrent requests (5) and same total number of requests (111) has been run against both. Amazon has better memory than my local machine. But there are 2 CPUs in my local machine vs 1 CPU in m1.medium. My internet speed is very low at the moment, I am getting Transfer rate as 25.29KBps. How can I improve the performance ? Do not know how to interpret Connect, Processing, Waiting and total in ab output. Here is Localhost: Server Hostname: localhost Server Port: 9999 Document Path: / Document Length: 7631 bytes Concurrency Level: 5 Time taken for tests: 1.424 seconds Complete requests: 111 Failed requests: 102 (Connect: 0, Receive: 0, Length: 102, Exceptions: 0) Write errors: 0 Total transferred: 860808 bytes HTML transferred: 847155 bytes Requests per second: 77.95 [#/sec] (mean) Time per request: 64.148 [ms] (mean) Time per request: 12.830 [ms] (mean, across all concurrent requests) Transfer rate: 590.30 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 1 Processing: 14 63 99.9 43 562 Waiting: 14 60 96.7 39 560 Total: 14 63 99.9 43 563 And this is production: Document Path: / Document Length: 7783 bytes Concurrency Level: 5 Time taken for tests: 33.883 seconds Complete requests: 111 Failed requests: 0 Write errors: 0 Total transferred: 877566 bytes HTML transferred: 863913 bytes Requests per second: 3.28 [#/sec] (mean) Time per request: 1526.258 [ms] (mean) Time per request: 305.252 [ms] (mean, across all concurrent requests) Transfer rate: 25.29 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 290 297 14.0 293 413 Processing: 897 1178 63.4 1176 1391 Waiting: 296 606 135.6 588 1171 Total: 1191 1475 66.0 1471 1684

    Read the article

  • Performance associated with storing millions of files on NTFS

    - by Tim Brigham
    Does anyone have a method / formula, etc that I could use - hopefully based on both current and projected numbers of files - to project the 'right' length of the split and the number of nested folders? Please note that although similar it isn't quite the same as Storing a million images in the filesystem. I'm looking for a way to help make the theories outlined more generic. Assumptions I have 'some' initial number of files. This number would be arbitrary but large. Say 500k to 10m+. I have considered the underlying physical hardware disk IO requirements that would be necessary to support such an endeavor. Put another way As time progresses this store will grow. I want to have the best balance of current performance and as my needs increase. Say I double or triple my storage. I need to be able to address both current needs and projected future growth. I need to both plan ahead and not sacrifice too much of current performance. What I've come up with I'm already thinking about using a hash split every so many characters to split things out across multiple directories and keeping the trees even, very similar as outlined in the comments in the question above. It also avoids duplicate files, which would be critical over time. I'm sure that the initial folder structure would be different based on what I've outlined, and depending on the initial scale. As far as I can figure there isn't a one size fits all solution here. It would be horrendously time intensive to work something out experimentally.

    Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >