Search Results

Search found 453 results on 19 pages for 'throughput'.

Page 7/19 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

How to improve network performance between two Win 2008 KMV guest having virtio driver already?

- by taazaa

I have two physical servers with Ubuntu 10.04 server on them. They are connected with a 1Gbps card over a gigabit switch. Each of these host servers has one Win 2008 guest VM. Both VMs are well provisioned (4 cores, 12GB RAM), RAW disks. My asp.net/sql server applications are running much slower compared to very similar physical setups. Both machines are setup to use virtio for disk and network. I used iperf to check network performance and I get: Physical host 1 ----- Physical Host 2: 957 Mbits/sec Physical host 1 ----- Win 08 Guest 1: 557 Mbits/sec Win 08 Guest 1 ----- Phy host 1: 182 Mbits/sec Win 08 Guest 1 ----- Win 08 Guest 2: 111 Mbits /sec My app is running on Win08 Guest 1 and Guest 2 (web and db). There is a huge drop in network throughput (almost 90%) between the two guest. Further the throughput does not seem to be symmetric between host and guest as well. The CPU utilization on the guests and hosts is less than 2% right now (we are just testing right now). Apart from this, there have been random slow downs in the network to as low as 1 Mbits/sec making the whole application unusable. Any help to trouble shoot this would be appreciated.

Read the article
What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article
Poor home office network performance and cannot figure out where the issue is

- by Jeff Willener

This is the most bizarre issue. I have worked with small to mid size networks for quite a long time and can say I'm comfortable connecting hardware. Where you will start to lose me is with managed switches and firewalls. To start, let me describe my network (sigh, shouldn't but I MUST solve this). 1) Comcast Cable Internet 2) Motorola SURFboard eXtreme Cable Modem. a) Model: SB6120 b) DOCSIS 3.0 and 2.0 support c) IPv4 and IPv6 support 3-A) Cisco Small Business RV220W Wireless N Firewall a) Latest firmware b) Model: RV220W-A-K9-NA c) WAN Port to Modem (2) d) vlan 1: work e) vlan 2: everything else. 3-B) D-Link DIR-615 Draft 802.11 N Wireless Router a) Latest firmware b) WAN Port to Modem (2) 4) Servers connected directly to firewall a) If firewall 3-A, then vlan 1 b) CAT5e patch cables c) Dell PowerEdge 1400SC w/ 10/100 integrated NIC (Domain Controller, DNS, former DHCP) d) Dell PowerEdge 400SC w/ 10/100/1000 integrated NIC (VMWare Server) 4) Linksys EZXS88W unmanaged Workgroup 10/100 Switch a) If firewall 3-A, then vlan 2 b) 25' CAT5e patch cable to firewall (3-A or 3-B) c) Connects xBox 360, Blu-Ray player, PC at TV 5) Office equipment connected directly to firewall a) If firewall 3-A, then vlan 1 b) ~80' CAT6 or CAT5e patch cable to firewall (3-A or 3-B) c) Connects 1) Dell Latitude laptop 10/100/1000 2) Dell Inspiron laptop 10/100 3) Dell Workstation 10/100/1000 (Pristine host, VMWare Workstation 7.x with many bridged VM's) 4) Brother Laser Printer 10/100 5) Epson All-In-One Workforce 310 10/100 5-A) NetGear FS116 unmanaged 10/100 switch a) I've had this switch for a long time and never had issues. 5-B) NetGear GS108 unmanaged 10/100/1000 switch a) Bought new for this issue and returned. 5-C) Linksys SE2500 unmanaged 10/100/1000 switch a) Bought new for this issue and returned. 5-D) TP-Link TL-SG10008D unmanaged 10/100/1000 a) Bought new for this issue and still have. 6) VLan 1 Wireless Connections (on same subnet if 3-B) a) Any of those at 5c b) HP Laptop 7) VLan 2 Wireless Connection (on same subnet if 3-B) a) IPad, IPod b) Compaq Laptop c) Epson Wireless Printer Shew, without hosting a diagram I hope that paints a good picture. The Issue The breakdown here is at item 5. No matter what I do I cannot have a switch at 5 and have to run everything wireless regardless of router. Issues related to using a switch (point 5 above) SpeedTest is good. Poor throughput to other devices if can communicate at all. Usually cannot ping other devices even on the same switch although, when able, ping times are good. Eventual lose of connectivity and can "sometimes" be restored by unplugging everything for several days, not minutes or hours but we're talking a week if at all. Directly connect to computer gives good internet connection however throughput to other devices connected to firewall is at best horrible. Yet printing doesn't seem to be an issue as long as they are connected via wireless. I have to force the RV220W to 1000Mb on the respective port if using a Gig Switch Issues related to using wireless in place of a switch (point 5 above) Poor throughput to other devices if can communicate. SpeedTest is good. Bottom line Internet speeds are awesome. By the way, Comcast went WAY above and beyond to make sure it was not them. They rewired EVERYTHING which did solve internet drops. Computer to computer connections are garbage Cannot get switch at 5 to work, yet other at 4 has never had an issue. Direct connection, bypass switch, is good for DHCP and internet. DNS must be on server, not firewall. Cisco insists its my switches but as you can see I have used four and two different cables with the same result. My gut feeling is something is happening with routing. But I'm not smart enough to know that answer. I run a lot of VM's at 5-c-3, could that cause it? What's different compared to my previous house is I have introduced Gigabit hardware (firewall/switches/computers). Some of my computers might have IPv6 turned on if I haven't turned it off already. I'm truly at a loss and hope anyone has some crazy idea how to solve this. Bottom line, I need a switch in my office behind the firewall. I've changed everything. The real crux is I will find a working solution and, again, after days it will stop working. So this means I cannot isolate if its a computer since I have to use them. Oh and a solution is not throwing more money at this. I'm well into $1k already. Yah, lame.

Read the article
Load-balancing between a Procurve switch and a server

- by vlad

Hello I've been searching around the web for this problem i've been having. It's similar in a way to this question: How exactly & specifically does layer 3 LACP destination address hashing work? My setup is as follows: I have a central switch, a Procurve 2510G-24, image version Y.11.16. It's the center of a star topology, there are four switches connected to it via a single gigabit link. Those switches service the users. On the central switch, I have a server with two gigabit interfaces that I want to bond together in order to achieve higher throughput, and two other servers that have single gigabit connections to the switch. The topology looks as follows: sw1 sw2 sw3 sw4 | | | | --------------------- | sw0 | --------------------- || | | srv1 srv2 srv3 The servers were running FreeBSD 8.1. On srv1 I set up a lagg interface using the lacp protocol, and on the switch I set up a trunk for the two ports using lacp as well. The switch showed that the server was a lacp partner, I could ping the server from another computer, and the server could ping other computers. If I unplugged one of the cables, the connection would keep working, so everything looked fine. Until I tested throughput. There was only one link used between srv1 and sw0. All testing was conducted with iperf, and load distribution was checked with systat -ifstat. I was looking to test the load balancing for both receive and send operations, as I want this server to be a file server. There were therefore two scenarios: iperf -s on srv1 and iperf -c on the other servers iperf -s on the other servers and iperf -c on srv1 connected to all the other servers. Every time only one link was used. If one cable was unplugged, the connections would keep going. However, once the cable was plugged back in, the load was not distributed. Each and every server is able to fill the gigabit link. In one-to-one test scenarios, iperf was reporting around 940Mbps. The CPU usage was around 20%, which means that the servers could withstand a doubling of the throughput. srv1 is a dell poweredge sc1425 with onboard intel 82541GI nics (em driver on freebsd). After troubleshooting a previous problem with vlan tagging on top of a lagg interface, it turned out that the em could not support this. So I figured that maybe something else is wrong with the em drivers and / or lagg stack, so I started up backtrack 4r2 on this same server. So srv1 now uses linux kernel 2.6.35.8. I set up a bonding interface bond0. The kernel module was loaded with option mode=4 in order to get lacp. The switch was happy with the link, I could ping to and from the server. I could even put vlans on top of the bonding interface. However, only half the problem was solved: if I used srv1 as a client to the other servers, iperf was reporting around 940Mbps for each connection, and bwm-ng showed, of course, a nice distribution of the load between the two nics; if I run the iperf server on srv1 and tried to connect with the other servers, there was no load balancing. I thought that maybe I was out of luck and the hashes for the two mac addresses of the clients were the same, so I brought in two new servers and tested with the four of them at the same time, and still nothing changed. I tried disabling and reenabling one of the links, and all that happened was the traffic switched from one link to the other and back to the first again. I also tried setting the trunk to "plain trunk mode" on the switch, and experimented with other bonding modes (roundrobin, xor, alb, tlb) but I never saw any traffic distribution. One interesting thing, though: one of the four switches is a Cisco 2950, image version 12.1(22)EA7. It has 48 10/100 ports and 2 gigabit uplinks. I have a server (call it srv4) with a 4 channel trunk connected to it (4x100), FreeBSD 8.0 release. The switch is connected to sw0 via gigabit. If I set up an iperf server on one of the servers connected to sw0 and a client on srv4, ALL 4 links are used, and iperf reports around 330Mbps. systat -ifstat shows all four interfaces are used. The cisco port-channel uses src-mac to balance the load. The HP should use both the source and destination according to the manual, so it should work as well. Could this mean there is some bug in the HP firmware? Am I doing something wrong?

Read the article
Can someone explain the "use-cases" for the default munin graphs?

- by exhuma

When installing munin, it activates a default set of plugins (at least on ubuntu). Alternatively, you can simply run munin-node-configure to figure out which plugins are supported on your system. Most of these plugins plot straight-forward data. My question is not to explain the nature of the data (well... maybe for some) but what is it that you look for in these graphs? It is easy to install munin and see fancy graphs. But having the graphs and not being able to "read" them renders them totally useless. I am going to list standard plugins which are enabled by default on my system. So it's going to be a long list. For completeness, I am also going to list plugins which I think to understand and give a short explanation as to what I think it's used for. Pleas correct if I am wrong with any of them. So let me split this questions in three parts: Plugins where I don't even understand the data Plugins where I understand the data but don't know what I should look out for Plugins which I think to understand Plugins where I don't even understand the data These may contain questions that are not necessarily aimed at munin alone. Not understanding the data usually mean a gap in fundamental knowledge on operating systems/hardware.... ;) Feel free to respond with a "giyf" answer. These are plugins where I can only guess what's going on... I hardly want to look at these "guessing"... Disk IOs per device (IOs/second)What's an IO. I know it stands for input/output. But that's as far as it goes. Disk latency per device (Average IO wait)Not a clue what an "IO wait" is... IO Service TimeThis one is a huge mess, and it's near impossible to see something in the graph at all. Plugins where I understand the data but don't know what I should look out for IOStat (blocks/second read/written)I assume, the thing to look out for in here are spikes? Which would mean that the device is in heavy use? Available entropy (bytes)I assume that this is important for random number generation? Why would I graph this? So far the value has always been near constant. VMStat (running/I/O sleep processes)What's the difference between this one and the "processes" graph? Both show running/sleeping processes, whereas the "Processes" graph seems to have more details. Disk throughput per device (bytes/second read/written) What's thedifference between this one and the "IOStat" graph? inode table usageWhat should I look for in this graph? Plugins which I think to understand I'll be guessing some things here... correct me if I am wrong. Disk usage in percent (percent)How much disk space is used/remaining. As this is approaching 100%, you should consider cleaning up or extend the partition. This is extremely important for the root partition. Firewall Throughput (packets/second)The number of packets passing through the firewall. If this is spiking for a longer period of time, it could be a sign of a DOS attack (or we are simply recieving a large file). It can also give you an idea about your firewall performance. If it's levelling out and you need more "power" you should consider load balancing. If it's levelling out and see a correlation with your CPU load, it could also mean that your hardware is not fast enough. Correlations with disk usage could point to excessive LOG targets in you FW config. eth0 errors (packets in/out)Network errors. If this value is increasing, it could be a sign of faulty hardware. eth0 traffic (bits/second in/out)Raw network traffic. This should correlate with Firewall throughput. number of threadsAn ever-increasing value might point to a process not properly closing threads. Investigate! processesBreakdown of active processes (including sleeping). A quick spike in here might point to a fork-bomb. A slowly, but ever-increasing value might point to an application spawning sub-processes but not properly closing them. Investigate using ps faux. process priorityThis shows the distribution of process priorities. Having only high-priority processes is not of much use. Consider de-prioritizing some. cpu usageFairly straight-forward. If this is spiking, you may have an attack going on, or a process is hogging the CPU. Idf it's slowly increasing and approaching max in normal operations, you should consider upgrading your hardware (or load-balancing). file table usageNumber of actively open files. If this is reaching max, you may have a process opening, but not properly releasing files. load averageShows an summarized value for the system load. Should correlate with CPU usage. Increasing values can come from a number of sources. Look for correlations with other graphs. memory usageA graphical representation of you memory. As long as you have a lot of unused+cache+buffers you are fine. swap in/outShows the activity on your swap partition. This should always be 0. If you see activity on this, you should add more memory to your machine!

Read the article
How to open a server port outside of an OpenVPN tunnel with a pf firewall on OSX (BSD)

- by Timbo

I have a Mac mini that I use as a media server running XBMC and serves media from my NAS to my stereo and TV (which has been color calibrated with a Spyder3Express, happy). The Mac runs OSX 10.8.2 and the internet connection is tunneled for general privacy over OpenVPN through Tunnelblick. I believe my anonymous VPN provider pushes "redirect_gateway" to OpenVPN/Tunnelblick because when on it effectively tunnels all non-LAN traffic in- and outbound. As an unwanted side effect that also opens the boxes server ports unprotected to the outside world and bypasses my firewall-router (Netgear SRX5308). I have run nmap from outside the LAN on the VPN IP and the server ports on the mini are clearly visible and connectable. The mini has the following ports open: ssh/22, ARD/5900 and 8080+9090 for the XBMC iOS client Constellation. I also have Synology NAS which apart from LAN file serving over AFP and WebDAV only serves up an OpenVPN/1194 and a PPTP/1732 server. When outside of the LAN I connect to this from my laptop over OpenVPN and over PPTP from my iPhone. I only want to connect through AFP/548 from the mini to the NAS. The border firewall (SRX5308) just works excellently, stable and with a very high throughput when streaming from various VOD services. My connection is a 100/10 with a close to theoretical max throughput. The ruleset is as follows Inbound: PPTP/1723 Allow always to 10.0.0.40 (NAS/VPN server) from a restricted IP range >corresponding to possible cell provider range OpenVPN/1194 Allow always to 10.0.0.40 (NAS/VPN server) from any Outbound: Default outbound policy: Allow Always OpenVPN/1194 TCP Allow always from 10.0.0.40 (NAS) to a.b.8.1-a.b.8.254 (VPN provider) OpenVPN/1194 UDP Allow always to 10.0.0.40 (NAS) to a.b.8.1-a.b.8.254 (VPN provider) Block always from NAS to any On the Mini I have disabled the OSX Application Level Firewall because it throws popups which don't remember my choices from one time to another and that's annoying on a media server. Instead I run Little Snitch which controls outgoing connections nicely on an application level. I have configured the excellent OSX builtin firewall pf (from BSD) as follows pf.conf (Apple App firewall tie-ins removed) (# replaced with % to avoid formatting errors) ### macro name for external interface. eth_if = "en0" vpn_if = "tap0" ### wifi_if = "en1" ### %usb_if = "en3" ext_if = $eth_if LAN="{10.0.0.0/24}" ### General housekeeping rules ### ### Drop all blocked packets silently set block-policy drop ### all incoming traffic on external interface is normalized and fragmented ### packets are reassembled. scrub in on $ext_if all fragment reassemble scrub in on $vpn_if all fragment reassemble scrub out all ### exercise antispoofing on the external interface, but add the local ### loopback interface as an exception, to prevent services utilizing the ### local loop from being blocked accidentally. ### set skip on lo0 antispoof for $ext_if inet antispoof for $vpn_if inet ### spoofing protection for all interfaces block in quick from urpf-failed ############################# block all ### Access to the mini server over ssh/22 and remote desktop/5900 from LAN/en0 only pass in on $eth_if proto tcp from $LAN to any port {22, 5900, 8080, 9090} ### Allow all udp and icmp also, necessary for Constellation. Could be tightened. pass on $eth_if proto {udp, icmp} from $LAN to any ### Allow AFP to 10.0.0.40 (NAS) pass out on $eth_if proto tcp from any to 10.0.0.40 port 548 ### Allow OpenVPN tunnel setup over unprotected link (en0) only to VPN provider IPs ### and port ranges pass on $eth_if proto tcp from any to a.b.8.0/24 port 1194:1201 ### OpenVPN Tunnel rules. All traffic allowed out, only in to ports 4100-4110 ### Outgoing pings ok pass in on $vpn_if proto {tcp, udp} from any to any port 4100:4110 pass out on $vpn_if proto {tcp, udp, icmp} from any to any So what are my goals and what does the above setup achieve? (until you tell me otherwise :) 1) Full LAN access to the above ports on the mini/media server (including through my own VPN server) 2) All internet traffic from the mini/media server is anonymized and tunneled over VPN 3) If OpenVPN/Tunnelblick on the mini drops the connection, nothing is leaked both because of pf and the router outgoing ruleset. It can't even do a DNS lookup through the router. So what do I have to hide with all this? Nothing much really, I just got carried away trying to stop port scans through the VPN tunnel :) In any case this setup works perfectly and it is very stable. The Problem at last! I want to run a minecraft server and I installed that on a separate user account on the mini server (user=mc) to keep things partitioned. I don't want this server accessible through the anonymized VPN tunnel because there are lots more port scans and hacking attempts through that than over my regular IP and I don't trust java in general. So I added the following pf rule on the mini: ### Allow Minecraft public through user mc pass in on $eth_if proto {tcp,udp} from any to any port 24983 user mc pass out on $eth_if proto {tcp, udp} from any to any user mc And these additions on the border firewall: Inbound: Allow always TCP/UDP from any to 10.0.0.40 (NAS) Outbound: Allow always TCP port 80 from 10.0.0.40 to any (needed for online account checkups) This works fine but only when the OpenVPN/Tunnelblick tunnel is down. When up no connection is possbile to the minecraft server from outside of LAN. inside LAN is always OK. Everything else functions as intended. I believe the redirect_gateway push is close to the root of the problem, but I want to keep that specific VPN provider because of the fantastic throughput, price and service. The Solution? How can I open up the minecraft server port outside of the tunnel so it's only available over en0 not the VPN tunnel? Should I a static route? But I don't know which IPs will be connecting...stumbles How secure would to estimate this setup to be and do you have other improvements to share? I've searched extensively in the last few days to no avail...If you've read this far I bet you know the answer :)

Read the article
SQL SERVER – ASYNC_IO_COMPLETION – Wait Type – Day 11 of 28

- by pinaldave

For any good system three things are vital: CPU, Memory and IO (disk). Among these three, IO is the most crucial factor of SQL Server. Looking at real-world cases, I do not see IT people upgrading CPU and Memory frequently. However, the disk is often upgraded for either improving the space, speed or throughput. Today we will look at another IO-related wait type. From Book On-Line: Occurs when a task is waiting for I/Os to finish. ASYNC_IO_COMPLETION Explanation: Any tasks are waiting for I/O to finish. If by any means your application that’s connected to SQL Server is processing the data very slowly, this type of wait can occur. Several long-running database operations like BACKUP, CREATE DATABASE, ALTER DATABASE or other operations can also create this wait type. Reducing ASYNC_IO_COMPLETION wait: When it is an issue related to IO, one should check for the following things associated to IO subsystem: Look at the programming and see if there is any application code which processes the data slowly (like inefficient loop, etc.). Note that it should be re-written to avoid this wait type. Proper placing of the files is very important. We should check the file system for proper placement of the files – LDF and MDF on separate drive, TempDB on another separate drive, hot spot tables on separate filegroup (and on separate disk), etc. Check the File Statistics and see if there is a higher IO Read and IO Write Stall SQL SERVER – Get File Statistics Using fn_virtualfilestats. Check event log and error log for any errors or warnings related to IO. If you are using SAN (Storage Area Network), check the throughput of the SAN system as well as configuration of the HBA Queue Depth. In one of my recent projects, the SAN was performing really badly and so the SAN administrator did not accept it. After some investigations, he agreed to change the HBA Queue Depth on the development setup (test environment). As soon as we changed the HBA Queue Depth to quite a higher value, there was a sudden big improvement in the performance. It is very likely to happen that there are no proper indexes on the system and yet there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can effectively reduce lots of CPU, Memory and IO (considering cover index has lesser columns than cluster table and all other; it depends upon the situation). You can refer to the following two articles I wrote that talk about how to optimize indexes: Create Missing Indexes Drop Unused Indexes Checking Memory Related Perfmon Counters SQLServer: Memory Manager\Memory Grants Pending (Consistent higher value than 0-2) SQLServer: Memory Manager\Memory Grants Outstanding (Consistent higher value, Benchmark) SQLServer: Buffer Manager\Buffer Hit Cache Ratio (Higher is better, greater than 90% for usually smooth running system) SQLServer: Buffer Manager\Page Life Expectancy (Consistent lower value than 300 seconds) Memory: Available Mbytes (Information only) Memory: Page Faults/sec (Benchmark only) Memory: Pages/sec (Benchmark only) Checking Disk Related Perfmon Counters Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good) Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good) Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good) Read all the post in the Wait Types and Queue series. Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All the discussions of Wait Stats in this blog are generic and vary from system to system. It is recommended that you test this on a development server before implementing it to a production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

Read the article
SQL SERVER – IO_COMPLETION – Wait Type – Day 10 of 28

- by pinaldave

For any good system three things are vital: CPU, Memory and IO (disk). Among these three, IO is the most crucial factor of SQL Server. Looking at real-world cases, I do not see IT people upgrading CPU and Memory frequently. However, the disk is often upgraded for either improving the space, speed or throughput. Today we will look at an IO-related wait types. From Book On-Line: Occurs while waiting for I/O operations to complete. This wait type generally represents non-data page I/Os. Data page I/O completion waits appear as PAGEIOLATCH_* waits. IO_COMPLETION Explanation: Any tasks are waiting for I/O to finish. This is a good indication that IO needs to be looked over here. Reducing IO_COMPLETION wait: When it is an issue concerning the IO, one should look at the following things related to IO subsystem: Proper placing of the files is very important. We should check the file system for proper placement of files – LDF and MDF on a separate drive, TempDB on another separate drive, hot spot tables on separate filegroup (and on separate disk),etc. Check the File Statistics and see if there is higher IO Read and IO Write Stall SQL SERVER – Get File Statistics Using fn_virtualfilestats. Check event log and error log for any errors or warnings related to IO. If you are using SAN (Storage Area Network), check the throughput of the SAN system as well as the configuration of the HBA Queue Depth. In one of my recent projects, the SAN was performing really badly so the SAN administrator did not accept it. After some investigations, he agreed to change the HBA Queue Depth on development (test environment) set up and as soon as we changed the HBA Queue Depth to quite a higher value, there was a sudden big improvement in the performance. It is very possible that there are no proper indexes in the system and there are lots of table scans and heap scans. Creating proper index can reduce the IO bandwidth considerably. If SQL Server can use appropriate cover index instead of clustered index, it can effectively reduce lots of CPU, Memory and IO (considering cover index has lesser columns than cluster table and all other; it depends upon the situation). You can refer to the two articles that I wrote; they are about how to optimize indexes: Create Missing Indexes Drop Unused Indexes Checking Memory Related Perfmon Counters SQLServer: Memory Manager\Memory Grants Pending (Consistent higher value than 0-2) SQLServer: Memory Manager\Memory Grants Outstanding (Consistent higher value, Benchmark) SQLServer: Buffer Manager\Buffer Hit Cache Ratio (Higher is better, greater than 90% for usually smooth running system) SQLServer: Buffer Manager\Page Life Expectancy (Consistent lower value than 300 seconds) Memory: Available Mbytes (Information only) Memory: Page Faults/sec (Benchmark only) Memory: Pages/sec (Benchmark only) Checking Disk Related Perfmon Counters Average Disk sec/Read (Consistent higher value than 4-8 millisecond is not good) Average Disk sec/Write (Consistent higher value than 4-8 millisecond is not good) Average Disk Read/Write Queue Length (Consistent higher value than benchmark is not good) Note: The information presented here is from my experience and there is no way that I claim it to be accurate. I suggest reading Book OnLine for further clarification. All the discussions of Wait Stats in this blog are generic and vary from system to system. It is recommended that you test this on a development server before implementing it to a production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Types, SQL White Papers, T SQL, Technology

Read the article
Observing flow control idle time in TCP

- by user12820842

Previously I described how to observe congestion control strategies during transmission, and here I talked about TCP's sliding window approach for handling flow control on the receive side. A neat trick would now be to put the pieces together and ask the following question - how often is TCP transmission blocked by congestion control (send-side flow control) versus a zero-sized send window (which is the receiver saying it cannot process any more data)? So in effect we are asking whether the size of the receive window of the peer or the congestion control strategy may be sub-optimal. The result of such a problem would be that we have TCP data that we could be transmitting but we are not, potentially effecting throughput. So flow control is in effect: when the congestion window is less than or equal to the amount of bytes outstanding on the connection. We can derive this from args[3]-tcps_snxt - args[3]-tcps_suna, i.e. the difference between the next sequence number to send and the lowest unacknowledged sequence number; and when the window in the TCP segment received is advertised as 0 We time from these events until we send new data (i.e. args[4]-tcp_seq = snxt value when window closes. Here's the script: #!/usr/sbin/dtrace -s #pragma D option quiet tcp:::send / (args[3]-tcps_snxt - args[3]-tcps_suna) = args[3]-tcps_cwnd / { cwndclosed[args[1]-cs_cid] = timestamp; cwndsnxt[args[1]-cs_cid] = args[3]-tcps_snxt; @numclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = count(); } tcp:::send / cwndclosed[args[1]-cs_cid] && args[4]-tcp_seq = cwndsnxt[args[1]-cs_cid] / { @meantimeclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = avg(timestamp - cwndclosed[args[1]-cs_cid]); @stddevtimeclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = stddev(timestamp - cwndclosed[args[1]-cs_cid]); @numclosed["cwnd", args[2]-ip_daddr, args[4]-tcp_dport] = count(); cwndclosed[args[1]-cs_cid] = 0; cwndsnxt[args[1]-cs_cid] = 0; } tcp:::receive / args[4]-tcp_window == 0 && (args[4]-tcp_flags & (TH_SYN|TH_RST|TH_FIN)) == 0 / { swndclosed[args[1]-cs_cid] = timestamp; swndsnxt[args[1]-cs_cid] = args[3]-tcps_snxt; @numclosed["swnd", args[2]-ip_saddr, args[4]-tcp_dport] = count(); } tcp:::send / swndclosed[args[1]-cs_cid] && args[4]-tcp_seq = swndsnxt[args[1]-cs_cid] / { @meantimeclosed["swnd", args[2]-ip_daddr, args[4]-tcp_sport] = avg(timestamp - swndclosed[args[1]-cs_cid]); @stddevtimeclosed["swnd", args[2]-ip_daddr, args[4]-tcp_sport] = stddev(timestamp - swndclosed[args[1]-cs_cid]); swndclosed[args[1]-cs_cid] = 0; swndsnxt[args[1]-cs_cid] = 0; } END { printf("%-6s %-20s %-8s %-25s %-8s %-8s\n", "Window", "Remote host", "Port", "TCP Avg WndClosed(ns)", "StdDev", "Num"); printa("%-6s %-20s %-8d %@-25d %@-8d %@-8d\n", @meantimeclosed, @stddevtimeclosed, @numclosed); } So this script will show us whether the peer's receive window size is preventing flow ("swnd" events) or whether congestion control is limiting flow ("cwnd" events). As an example I traced on a server with a large file transfer in progress via a webserver and with an active ssh connection running "find / -depth -print". Here is the output: ^C Window Remote host Port TCP Avg WndClosed(ns) StdDev Num cwnd 10.175.96.92 80 86064329 77311705 125 cwnd 10.175.96.92 22 122068522 151039669 81 So we see in this case, the congestion window closes 125 times for port 80 connections and 81 times for ssh. The average time the window is closed is 0.086sec for port 80 and 0.12sec for port 22. So if you wish to change congestion control algorithm in Oracle Solaris 11, a useful step may be to see if congestion really is an issue on your network. Scripts like the one posted above can help assess this, but it's worth reiterating that if congestion control is occuring, that's not necessarily a problem that needs fixing. Recall that congestion control is about controlling flow to prevent large-scale drops, so looking at congestion events in isolation doesn't tell us the whole story. For example, are we seeing more congestion events with one control algorithm, but more drops/retransmission with another? As always, it's best to start with measures of throughput and latency before arriving at a specific hypothesis such as "my congestion control algorithm is sub-optimal".

Read the article
Fast Data - Big Data's achilles heel

- by thegreeneman

At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database. Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions. Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity. The software in the Fast Data solution stack involves 3 key pieces and their integration: Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database. All three of these technologies address a high throughput, low latency data management requirement. Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream. Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access. It also has some special capabilities to deploy remote behavioral execution for "near data" processing. The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads. For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics. Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis. Understand how NoSQL plays a critical role in Big Data capture and enrichment while simultaneously providing a low latency and scalable data management infrastructure thru clustered, always on, parallel processing in a shared nothing architecture. Learn how easily a NoSQL cluster can be deployed to provide essential services in industry specific Fast Data solutions. See these technologies work together in a demonstration highlighting the salient features of these Fast Data enabling technologies in a location based personalization service. The question then becomes how do these things work together to deliver an end to end Fast Data solution. The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together. You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval. Here is a great reference to how these two technologies are integrated and work together. Coherence & Oracle NoSQL Database. On the stream processing side, it is similar as with the Coherence case. As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage. OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream. Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together. OEP & Oracle NoSQL Database.

Read the article
Is memory allocation in linux non-blocking?

- by Mark

I am curious to know if the allocating memory using a default new operator is a non-blocking operation. e.g. struct Node { int a,b; }; ... Node foo = new Node(); If multiple threads tried to create a new Node and if one of them was suspended by the OS in the middle of allocation, would it block other threads from making progress? The reason why I ask is because I had a concurrent data structure that created new nodes. I then modified the algorithm to recycle the nodes. The throughput performance of the two algorithms was virtually identical on a 24 core machine. However, I then created an interference program that ran on all the system cores in order to create as much OS pre-emption as possible. The throughput performance of the algorithm that created new nodes decreased by a factor of 5 relative the the algorithm that recycled nodes. I'm curious to know why this would occur. Thanks. *Edit : pointing me to the code for the c++ memory allocator for linux would be helpful as well. I tried looking before posting this question, but had trouble finding it.

Read the article
Visualization Tools for Networks that Change with Time

- by devin

Does anyone known of any good Linux tools for visualizing networks that change (quickly) with time. I'm interested in things like: Routes between nodes Delays between nodes (especially as they change with time) Throughput I have root access to all the nodes (so I can run daemons on them all). Also, assume that I either have a management network that is stable or I will collect data and then analyze it offline.

Read the article
IRP_MJ_WRITE latency up to 15 seconds

- by racitup

We have written an application that performs small (22kB) writes to multiple files at once (one thread performing asynchronous queued writes to multiple locations on behalf of other threads) on the same local volume (RAID1). 99.9% of the writes are low-latency but occasionally (maybe every minute or two) we get one or two huge latency writes (I have seen 10 seconds and above) without any real explanation. Platform: Win2003 Server with NTFS. Monitoring: Sysinternals Process Monitor (see link below) and our own application logging. We have tried multiple things to try and solve this that have been gleaned from a few websites, e.g.: Making the first part of file names unique to aid 8.3 name generation Writing files to multiple directories Changing Intel Disk Write Caching Windows File/Printer Sharing Minimize memory used Balance Maximize data throughput for file sharing Maximize data throughput for network applications System-Advanced-Performance-Advanced NtfsDisableLastAccessUpdate - use fsutil behavior set disablelastaccess 1 disable 8.3 name generation - use "fsutil behavior set disable8dot3 1" + restart Enable a large size file system cache Disable paging of the kernel code IO Page Lock Limit Turn Off (or On) the Indexing Service But nothing seems to make much difference. There's a whole host of things we haven't tried yet but we wondered if anyone had come across the same problem, a reason and a solution (programmatic or not)? We can reproduce the problem using IOMeter and a simple setup: Start IOMeter and remove all but the first worker thread in 'Topology' using the disconnect button. Select the Worker thread and put a cross in the box next to the disk you want to use in the Disk Targets tab and put '2000000' in Maximum Disk Size (NOTE: must have at least 1GB free space; sector size is 512 bytes) Next create a new access specification and add it to the worker thread: Transfer Request Size = 22kB 100% Sequential Percent of Access Spec = 100% Percent Read/Write = 100% Write Change Results Display Update Frequency to 5 seconds, Test Setup Run Time to 20 seconds and both 'Number of Workers to Spawn Automatically' settings to zero. Select the Worker Thread in the Topology panel and hit the Duplicate Worker button 59 times to create 60 threads with identical settings. Hit the 'Go' button (green flag) and monitor the Results tab. The 'Maximum I/O Response Time (ms)' always hits at least 3500 on our machine. Our machine isn't exactly slow (Xeon 8 core rack server with 4GB and onboard RAID). I'd be interested to see what other people get. We have a feeling it might be something to do with the NTFS filesystem (ours is currently 75% full of fragmented files) and we are going to try a few things around this principle. But it is also related to disk performance since we don't see it on a RAMDisk and it's not as severe on a RAID10 array. Any help is much appreciated. Richard Right-click and select 'Open Link in New Tab': ProcMon Result

Read the article
Why do I see a large performance hit with DRBD?

- by BHS

I see a much larger performance hit with DRBD than their user manual says I should get. I'm using DRBD 8.3.7 (Fedora 13 RPMs). I've setup a DRBD test and measured throughput of disk and network without DRBD: dd if=/dev/zero of=/data.tmp bs=512M count=1 oflag=direct 536870912 bytes (537 MB) copied, 4.62985 s, 116 MB/s / is a logical volume on the disk I'm testing with, mounted without DRBD iperf: [ 4] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec According to Throughput overhead expectations, the bottleneck would be whichever is slower, the network or the disk and DRBD should have an overhead of 3%. In my case network and I/O seem to be pretty evenly matched. It sounds like I should be able to get around 100 MB/s. So, with the raw drbd device, I get dd if=/dev/zero of=/dev/drbd2 bs=512M count=1 oflag=direct 536870912 bytes (537 MB) copied, 6.61362 s, 81.2 MB/s which is slower than I would expect. Then, once I format the device with ext4, I get dd if=/dev/zero of=/mnt/data.tmp bs=512M count=1 oflag=direct 536870912 bytes (537 MB) copied, 9.60918 s, 55.9 MB/s This doesn't seem right. There must be some other factor playing into this that I'm not aware of. global_common.conf global { usage-count yes; } common { protocol C; } syncer { al-extents 1801; rate 33M; } data_mirror.res resource data_mirror { device /dev/drbd1; disk /dev/sdb1; meta-disk internal; on cluster1 { address 192.168.33.10:7789; } on cluster2 { address 192.168.33.12:7789; } } For the hardware I have two identical machines: 6 GB RAM Quad core AMD Phenom 3.2Ghz Motherboard SATA controller 7200 RPM 64MB cache 1TB WD drive The network is 1Gb connected via a switch. I know that a direct connection is recommended, but could it make this much of a difference? Edited I just tried monitoring the bandwidth used to try to see what's happening. I used ibmonitor and measured average bandwidth while I ran the dd test 10 times. I got: avg ~450Mbits writing to ext4 avg ~800Mbits writing to raw device It looks like with ext4, drbd is using about half the bandwidth it uses with the raw device so there's a bottleneck that is not the network.

Read the article
Daisy-chaining network switches with multiple cables

- by user72153

This might just be the dumbest question you'll ever read, but I digress. Say I have two 100Mbps Ethernet switches with 2 computers on each, connected together by a single cable. This way, the two PCs on each switch share the 100Mbps bandwidth with the others. If I added another cable between the 2 switches, would there be 200Mbps throughput available between the switches? Or am I completely off my rocker? Thanks for the help.

Read the article
Page cache flushing behavior under heavy append load

- by Bryce

I'm trying to understand the behavior of the Linux pdflush daemon when: The page cache is initially pretty much empty There is a large amount of free memory The system starts undergoing heavy write load My understanding right now is that the vm.dirty_ratio and vm.dirty_background_ratio that control page cache flushing behavior are with respect to the present size of the page cache, which means that my writes will flush earlier than they would if the page cache was pre-populated (even with dummy data from some random file), and thus throughput will be lower. Is this accurate?

Read the article
Diagnosing and debugging LAN congestion / connection issues

- by John Weldon

What are the top N tools / methodologies used to diagnose and repair network issues? Given a LAN, for example, where users are able to consistently ping an outside server, but any data intensive connections are flaky; how would you begin solving the network issues? I imagine issues like congestion, bandwidth constraints, throughput constraints, etc. are all factors, but I don't know how to diagnose those issues. I'm especially interested in LAN environments (rather than WAN)

Read the article
Number of Splitters on Coaxial Connection and Cable Internet Quality of Service

- by Matthew Ruston

Does running multiple coaxial splitters on a single coaxial cable line effect quality of service for cable internet connections? Suppose there are 2-4 splitters between the cable line coming into a building before the connection to a cable modem, does this negatively effect the latency, throughput, etc of a cable internet connection by any measurable amount? Is there a maximum number of times that a coaxial cable can be split into multiple internet and TV connections before QoS suffers?

Read the article
Linux RAID-0 performance doesn't scale up over 1 GB/s

- by wazoox

I have trouble getting the max throughput out of my setup. The hardware is as follow : dual Quad-Core AMD Opteron(tm) Processor 2376 16 GB DDR2 ECC RAM dual Adaptec 52245 RAID controllers 48 1 TB SATA drives set up as 2 RAID-6 arrays (256KB stripe) + spares. Software : Plain vanilla 2.6.32.25 kernel, compiled for AMD-64, optimized for NUMA; Debian Lenny userland. benchmarks run : disktest, bonnie++, dd, etc. All give the same results. No discrepancy here. io scheduler used : noop. Yeah, no trick here. Up until now I basically assumed that striping (RAID 0) several physical devices should augment performance roughly linearly. However this is not the case here : each RAID array achieves about 780 MB/s write, sustained, and 1 GB/s read, sustained. writing to both RAID arrays simultaneously with two different processes gives 750 + 750 MB/s, and reading from both gives 1 + 1 GB/s. however when I stripe both arrays together, using either mdadm or lvm, the performance is about 850 MB/s writing and 1.4 GB/s reading. at least 30% less than expected! running two parallel writer or reader processes against the striped arrays doesn't enhance the figures, in fact it degrades performance even further. So what's happening here? Basically I ruled out bus or memory contention, because when I run dd on both drives simultaneously, aggregate write speed actually reach 1.5 GB/s and reading speed tops 2 GB/s. So it's not the PCIe bus. I suppose it's not the RAM. It's not the filesystem, because I get exactly the same numbers benchmarking against the raw device or using XFS. And I also get exactly the same performance using either LVM striping and md striping. What's wrong? What's preventing a process from going up to the max possible throughput? Is Linux striping defective? What other tests could I run?

Read the article
Hi do I diagnose the bottleneck in mt Intel Atom based Ubuntu server?

- by Jon Cage

I have a small media server at home which has software raid and a gigabit link to the rest of my network. For some reason though, I only get ~10MB/s transfers when copying to/from the server. How can I benchmark network throughput (Windows 7 desktop <- Ubuntu server) and hard disk performance to try and identify where my bottleneck might be?

Read the article
List DB2 version, OS and hardware on Linux? (aws image)

- by mestika

Hello everybody, I'm not that familiar with Linux but I'm currently working on a aws image for an assignment and I need to display the DB2 version, the OS and the hardware. Is there a commando or program of some sort I can use for this purpose? I tried a rpm called "Bonnie" but that only writes the throughput for the system. Thanks Mestika

Read the article
Why is Hard Disk Access Slow Through a Virtual Machine?

- by Earl

Why is throughput to an external hard drive from a virtual machine much slower than through the host? Access is generally 30MB/sec from the host and 4MB/sec frm the client. Is this normal?

Read the article
Gigabit capacity

- by abronte

Do gigabit ports have a total throughput of 1 gigabit so that you could be sending 800 mbit and receiving 200 mbit at the same time. Or is it 1000 in and 1000 out?

Read the article
How should I interpret the specifications of a SSD?

- by paulgreg

When considering to buy a SSD, how should I interpret the different specifications of the SSD? Here are some specific things that need to be deciphered: Controller (this can affect performance and endurance more than all other factors combined) Bus Technology Form Factor (Physical Size) Capacity NAND or NOR technology Power Consumption during Read, during Write, when Idle Read/Write Burst and Sustained Throughput All of these things I would like to be explained in more detail and their actual importance in selecting an SSD.

Read the article
List DB2 version, OS and hardware on Linux? (aws image)H

- by mestika

Hello everybody, I'm not that familiar with Linux but I'm currently working on a aws image for an assignment and I need to display the DB2 version, the OS and the hardware. Is there a commando or program of some sort I can use for this purpose? I tried a rpm called "Bonnie" but that only writes the throughput for the system. Thanks Mestika

Read the article

Search Results

Search found 453 results on 19 pages for 'throughput'.

Page 7/19 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by taazaa

- by Alex Dunlop

- by Jeff Willener

- by vlad

- by exhuma

- by Timbo

- by pinaldave

- by pinaldave

- by user12820842

- by thegreeneman

- by Mark

- by devin

- by racitup

- by BHS

- by user72153

- by Bryce

- by John Weldon

- by Matthew Ruston

- by wazoox

- by Jon Cage

- by mestika

- by Earl

- by abronte

- by paulgreg

- by mestika

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >