Tech Note 0009
Common Network Performance Problems
The usual suspects if performance is not what you expect
Common Network Performance Problems
Transferring data from one computer to another involves passing that data through dozens, sometimes hundreds, of software and hardware components. Performance may be limited within a computer or within the network. Sometimes components which are functioning normally by themselves may interact in unexpected ways to create unexpected problems.
A file transfer may pass through the following components before it ever leaves the system: Hard disk drive, system disk cache, transfer application, network stack, software VPN, software firewalls and filters, network drivers, and the hardware network adapter. There are many opportunities for the data to become delayed, or blocked. Following is a list of the computer components most likely to impact network performance.
The operating system manages the resources and settings for all of the other system components. Driver configuration, CPU prioritization, IP stack tuning, virtual memory, and many other factors can impede performance. Generally speaking, older systems will have poorer performance than newer versions. Windows systems will have poorer performance than unix systems of the same vintage. Most users stick with default operating system settings, so it is rare to find problems beyond the choice of system itself. However, some unix distributions may require UDP buffer tuning as described in Tech Note 0024.
The CPU(s) must be shared by all of the software running on a system. In modern operating systems, there are typically twenty to sixty programs running even when the user isn't doing anything. All of these should be idle most of the time. But if just one of those programs tries to "hog" the CPU, it can seriously impede performance. Systems with multiple CPUs are not necessarily faster than those with single CPUs. Windows is sometimes slower on multiple CPU systems. Windows users can view CPU usage by displaying the Task Manager (control-alt-delete) and sorting by CPU.
The CPU can also constrain performance if you are using compression in conjunction with your data transfer. ExpeDat, for example, allows you to apply ZLIB compression to file transfers. Disabling compression may improve throughput, especially on very fast networks.
Most data transfer begins and ends with a hard drive. This is usually the slowest component within the machine. The speed of hard drive data transfer varies tremendously, often without obvious feedback to the user. In particular, if two or more programs try to access the hard drive at the same time, data throughput will drop exponentially. Even if the system appears idle, one of the many background processes can still cause drive access. Even intermittent access can cause data transfers to freeze for seconds at a time, leading to greatly reduced network performance.
Hard drive limitations become dominant when the network is very fast or when there are consistently multiple processes trying to access the drive. For example, trying to download two different files at once from a gigabit ethernet LAN to a single consumer hard drive may be much slower than downloading one after the other. High-end drives or RAIDs may improve overall performance, but are still highly variable and severely impacted by multiple accesses. RAIDs may require special configuration to perform well with network data. See Tech Note 0018 if you are using a RAID.
Software firewalls and filters attempt to block or alter network traffic using the same operating system and CPU resources as applications. All such software works by stopping each data packet, checking it against a set of rules, and then deciding what to do with that packet. Depending on the amount of checking, this can greatly delay the packet and consume substantial CPU resources. This can be problematic for moving data at high speeds when there are multiple layers, such as combining firewall, content filter, router, and network address translation software. To minimize problems, disable any components or rules that are not necessary and adjust others to allow MTP/IP traffic without filtering. See technical note 0002, "Configuring Firewalls".
VPN software is similar to a firewall or filter in that it stops each data packet and processes it at significant cost. This processing may include compression or encryption, which is especially costly. Datagram based VPNs (particularly IPsec) process one datagram at a time, adding a small amount of data to each one before sending it on its way. This introduces some CPU load and may rarely cause MTU problems. Datagram based VPNs are unlikely to hurt performance much, except on very fast LANs. Low quality VPNs "tunnel" all datagrams across a single TCP/IP connection. This includes so-called SSL VPNs. These introduce substantial network overhead and exponentially worsen TCP flow control and congestion problems. SSL and other tunneling VPNs exhibit poor network performance under all circumstances. MTP/IP cannot be used with SSL or other tunneling VPNs.
Although network data transfer does not typically involve much direct memory usage, the amount of available memory can significantly affect the performance of other operating system components. Inadequate system memory will cause the system to access the hard drive more often, leading to poor performance when reading or writing to disk.
Some versions of Microsoft Windows have internal limits on how quickly data can be read from or written to files based on the fixed size of the Windows "paged pool" buffer. If an MTP application produces an error such as "Insufficient system resources exist to complete the requested service" then you may need to adjust the Windows paged pool. The following Microsoft support note explains how to do this in the "RESOLUTION" section: http://support.microsoft.com/kb/304101. Windows Server 2012 or later is strongly recommended for speeds above a few hundred megabits per second.
The hardware component that moves data from the operating system to the physical media depends on correct drivers and settings to operate efficiently. Most modern NICs are rated at 1 gigabit per second. Some offer advanced features such as IP offloading. All have options to set a Maximum Transmit Unit (MTU).
For multigigabit networks, a very common problem is an NIC which is slower than the network speed. For example, a 1 gigabit NIC on a 10 gigabit network will only be able to move data at 1 gigabit per second. Likewise, a 10 gigabit NIC will not be able to fill a 40 gigabit data path. See Tech Note 0032 for more considerations on multigigabit data paths.
IP stack offloading attempts to perform checksum and other calculations in the NIC hardware, freeing the operating system kernel for other I/O tasks. This requires that compatible drivers be installed and correctly configured. Driver compatibility problems can cause IP offloading to make throughput slower, especially at gigabit and faster speeds. When experiencing performance problems, test with offloading both enabled and disabled.
The MTU set in the NIC should match that of the network path. For ethernet below one gigabit per second, this MTU should be at least 1500. For true multigigabit paths, this MTU should be at least 9000 (Jumbo). See the MTU section for further information.
MTP/IP depends heavily on the ability to accurately measure the timing of data transmission and arrival. It relies on the operating system to provide this timing information. Anything which disrupts the system timing will adversely affect MTP/IP. For example, radically changing the system clock, or putting the system to sleep for a while, will likely cause any running MTP/IP transactions to fail. More subtle problems may arise if the system clock is faulty, such as due to a failing motherboard battery or incorrectly configured network time management software (NTP). Such problems are rare on newer systems, but battery failure is a distinct possibility on hardware over five years old. Users must avoid situations which disrupt the timing of MTP/IP applications or the system as a whole.
Any user interface activity may disrupt the system in a way that impacts network performance. For example, viewing a web page will substantially reduce hard drive, CPU, memory, and network performance. Windows in particular gives higher priority to user actions than to other processes. For example, clicking the desktop just once per second during a Windows 2012 RDP session can cause a 25% drop in network throughput. The Windows Activity Monitor can cause up to a 50% drop in network throughput.
During testing, users should avoid any other activity. On Windows servers, all users should log out and disconnect from RDP whenever possible.
Machine virtualization allows a "guest" operating system to be run inside a simulated environment controlled by a program running on a real "host" system. Because the guest is a full operating system, much of what it does is redundant with the operations of the host operating system. Thus a virtualized system more than doubles the amount of processing that must occur with each network datagram. In particular, all of the other problems that may affect a system are at least doubled in a virtualized system.
Virtualized server environments often involve multiple guest systems, all contending for limited CPU, memory, and hard drive access. These different redundant systems can interact in unexpected ways. Network performance may be severely reduced for some traffic at some times, while it is improved for other traffic at other times. Generally speaking, virtualization involves substantial performance costs. This effect can be mitigated by testing in the host, rather than the guest, system.
See the "Virtual & Cloud Machines" section of Tech Note 0023 for more virtual machine setup recommendations.
A typical Internet path involves between ten and twenty "hops", which are declared routers or other network nodes. For example, a datagram might travel through the source computer, a wireless gateway, an ethernet hub, a DSL or cable modem, an ATM switch, many fiber optic routers, a T1 line and router, several ethernet hubs, and the destination computer. Additional devices may be hidden between the hops, at the telecom level.
Following is a list of the components and factors most likely to impact network performance. Note that several different components may be combined into a single network device. For example, a typical consumer DSL router includes a wireless gateway, ethernet switch, firewall, NAT, VPN, and modem in a single box.
These devices relay network traffic amongst network links. By itself, this functionality is typically very fast and rarely causes any problems. However, devices with these names often have one or more of the following components, which can be more problematic.
A firewall examines each datagram it receives and applies a set of rules to determine whether or not it should be allowed to pass through. Hardware firewalls typically do this very quickly with little or no impact on network performance. However, this depends on the rules being correctly configured.
Hardware firewalls must be configured to explicitly allow MTP/IP traffic to pass through without interference. If this is not done, then the firewall may block or degrade MTP/IP's performance. It is possible for a firewall to initially allow traffic to pass through, but then reduce or cut-off that traffic after several minutes. Users may be accustomed to their firewall correctly guessing what to do without explicit configuration, but those guesses may not be correct when MTP is introduced. See technical note 0002, "Configuring Firewalls", for advice on configuring firewalls.
Network emulators use statistical models that are designed around TCP and TCP-like network traffic. While emulators can be useful, they must be carefully configured using statistics appropriate to the traffic being tested. DEI strongly recommends testing MTP in real-world environments whenever possible. If you must use an emulator, carefully read Tech Note 0022 for information on programming it with the best possible data.
Machines on a Local Area Network may use private IP addresses to communicate with each other while sharing a single public IP address to communicate with the rest of the Internet. This is done for both security and to conserve scarce public addresses.
When a private machine seeks to talk to the public Internet, the NAT device must translate the private address into the public one. Because many private machines may be sharing the same public address, the NAT device has to keep track of which traffic belongs to which private machine. This involves keeping track of incoming and outgoing port numbers, and sometimes involves changing those port numbers.
If a private machine initiates an MTP/IP transaction, the NAT device should take note of the outgoing traffic and automatically route returning traffic back to the correct machine. However, some NAT devices may forget this information, causing an ongoing transaction to fail after several minutes.
The private machine will not be able to receive transactions initiated by outside systems unless the NAT is specifically instructed to "map" external ports to the internal machine. NAT port mapping is common task for setting up any server behind an NAT device.
VPN hardware is similar to a gateway or router in that it stops each data packet it receives and decides where to send it next. The difference with a VPN is that it simulates virtual network paths by encapsulating the data packets within other packets or data streams. One common example of this is allowing machines on different private LANs to communicate using private IP addresses. VPNs often add authentication and encryption to provide additional security to the traffic they handle.
This encapsulation can be performed either by datagrams or by TCP streams. Datagram based VPNs (particularly IPsec) process one datagram at a time, adding a small amount of data to each one before sending it on its way. This may rarely cause MTU problems, but is otherwise unlikely to cause performance problems.
Low quality, and some very old, VPNs "tunnel" all datagrams across a single TCP/IP connection. This includes so-called SSL VPNs. These introduce substantial network overhead and exponentially worsen TCP flow control and congestion problems. SSL and other tunneling VPNs exhibit poor network performance under all circumstances. MTP/IP cannot be used with SSL or other tunneling VPNs.
Internet Protocol (IP) networks transmit data in discrete packets called datagrams. Each datagram has source and destination addresses, some descriptive information, and the data payload. IP allows datagrams to be up to 65536 bytes in total size. However, most network media limits datagrams to much smaller sizes. Ethernet, for example, typically limits datagrams to just 1500 bytes. IP provides a mechanism for network devices to divide up datagrams that are too large for a particular link. This process is called fragmentation. Fragmentation introduces extra overhead per-datagram, but bigger payloads mean fewer datagrams and usually result in improved throughput
Most devices limit datagram sizes, even with fragmentation. But due to a lack of standards conformance, it is not possible to know for certain how large an MTU will be supported by any given network path. In particular, some devices may support large datagrams at a severe performance penalty, while others may give improved performance with large datagrams, and still others may silently discard large datagrams. This situation may be further complicated by VPNs or other other tunneling protocols (such as PPPoE), which add to the size of datagrams without telling the computers at either end. Thus a datagram which is transmitted at a proper size can grow along the path to become too large.
For this reason, MTP/IP usually limits its datagrams to a total of 1480 bytes, including UDP and IP headers. This leaves 20 bytes for MPLS, PPPoE, or simple IPsec overhead. MTP/IP will attempt to detect when smaller datagrams are required. However, in some environments, it may be helpful to tell MTP/IP not to exceed a specific limit. In most MTP/IP applications, this is done with the MaxDatagram configuration option. This value should be set to about 56 bytes less than the known MTU limit.
If you are using MTP/IP with a VPN, PPPoE, or other tunneling protocol and performance is poor, try setting MaxDatagram to a value of 1280. If this improves performance, you may try increasing the value in increments of 16 until performance no longer improves.
Some network equipment supports Jumbo ethernet frames (9000 MTU) or larger. Larger datagram sizes greatly reduce overhead for every device on the path and is essential for multigigabit performance. However, if any device in the path does not support at least Jumbo frames, then increasing MTP's datagram size may severely reduce network performance or cause a loss of connectivity. Where Jumbo frames are supported by every device in the path, substantial performance improvements can be achieved by setting MTP's MinDatagram to 8192. Even larger values may be used for paths which support Super Jumbo frames.
10 gigabit or faster network equipment will support Jumbo ethernet frames (9000 MTU) or larger. Conversely, a network path which fragments datagrams larger than 1500 bytes almost certainly contains at least one component which is not capable of 10 gigabit speeds. This is a common problem with bonded links. See Tech Note 0032 for more about working with 10 gigabit networks.
Wireless communications, including 802.11 WiFi, Bluetooth, satellite, and various cellular mechanisms present numerous additional challenges. Wireless media is subject to highly variable conditions which can degrade performance. Different media types react to problems in different ways. Some will attempt to correct for lost or corrupted data. This results in data delays and can cause transfers to "freeze up" for significant periods of time. Others simply drop corrupted data, allowing the transport and application layers to handle recovery. MTP/IP is very efficient at error recovery, so disabling hardware error correction may improve MTP/IP performance.
Some wireless media, particularly some satellite and cellular links, may entail tunneling across TCP/IP connections. MTP/IP will perform very poorly in such environments. Unfortunately, there is no easy way to know what sort of loss policies a given network is implementing, except by experimentation.
Sometimes multiple low-speed data links can be combined to form a virtual high-speed link. For example, three T1 lines might be bonded to provide 4.5 megabits per second instead of the single 1.5. Proper multiplexing requires that devices be installed at both ends to both split apart and recombine the data flows. The devices must correctly balance the flow of data packets across all available lines.
A bonded line is not the same as a single line of the same speed.
Performance of multiplexed or bonded lines varies tremendously. Lines bonded at the telecom level may perform almost as well as a real line of the same total speed. But lines which are bonded with end-user multiplexing hardware often limit individual data flows to the speed of just a single line. Bonded lines may have difficulty distributing data during high loads and may exhibit erratic behavior with different traffic types. Proper configuration of end-user multiplexers is essential to good performance.
To achieve maximum performance with MTP/IP over a bonded or multiplexed data path, all lines must converge to a single IP address at each end and the multiplexing hardware must be configured to distribute UDP/IP traffic in round-robin or even distribution. Multiplexers which claim to perform "load-balancing" often do not properly handle UDP/IP or have separate configurations for UDP/IP, so examine their configurations carefully.
Some devices attempt to capture and modify network traffic for the purpose of improving performance. This typically involves compression, caching, or de-duplication. Most such devices ignore UDP datagrams. Some, notably SilverPeak, will attempt to capture and modify UDP datagrams.
Even if it does not modify MTP/IP's UDP datagrams, the overhead created by the device may slow or limit network throughput. Legacy devices may have performance limits which are below that of the WAN network path. Devices which do modify UDP datagrams may cause corrupt packets and severely reduce or block MTP/IP throughput.
WAN Acceleration Appliances should be bypassed for MTP/IP traffic unless tests show a clear benefit.
This is a broad category which includes any device that actively interferes with the datagrams flowing through it in an effort to change performance. A common function is prioritization, in which a set of rules are applied to determine whether performance of some traffic should be degraded in favor of other traffic. Some devices attempt to compress or cache traffic in order to reduce the amount of data traveling on the network. Some observe network latency and will degrade the performance of any traffic which appears to cause latency to exceed some threshold.
The functioning of such devices is heavily dependent on their correct configuration. In particular, they must be properly programmed with the priorities of the users and network managers. If those priorities, or the network itself, change over time, the devices may begin to greatly hinder performance.
MTP/IP should work well with correctly configured devices that obey Internet Protocol standards. Other devices may severely degrade MTP/IP performance and should be investigated to ensure that they are functioning as intended.
Though not a component of the network, third-party traffic flow often has the largest impact on network performance. Any data traveling across any part of the same network path will reduce the performance any other data traveling that path. Ideally, network resources would be fairly divided amongst all users. But variation in capacities, use patterns, and requirements often makes it impossible to know what "fairly" means, let alone enforce it. Worse, oscillations in traffic flows can cause some traffic to create interference out of proportion to the bandwidth it consumes.
All else being equal, MTP/IP will perform much better than TCP/IP when third-party traffic is a factor. MTP/IP avoids the types of oscillations that cause some TCP/IP flows to excessively interfere with each other. However, networking is ultimately a zero-sum game: once the total data flow reaches the capacity of the network, no flow can gain without another one losing. Ideally, tests should be performed when third-party traffic is absent or at least steady. See technical note 0003, "Analyzing Network Performance", for additional testing advice.
Tech Note History
|Apr||06||2017||Updated RAM, Software firewalls|
|May||08||2014||NIC Details, Acceleration Appliances|
|Nov||16||2011||Updated MTU, Bonded Links, Memory|
|Oct||26||2010||Tech Note 0024|