Data Expedition, Inc. ®

Move Data Faster

Support

Support
Tech Notes
Performance
Network Performance
Managing Bandwidth
Common Problems
Compression
RAID Performance
Loss, Latency, Speed
Network Emulators
Rec. Hardware
UDP Tuning
NAS Tuning
Multigigabit

Network Performance

Page Index:
Good Things
Bad Things
Inline vs. Offline
Hidden Devices
Bottom Line
Tech Note History
May012014Streaming Folders
Aug262010Small Files
Nov072008First Post

Compression Pros & Cons

Simply put, compression is a process which trades CPU cycles for bytes.  But the trade isn't always a good one.  Sometimes you can spend a lot of valuable CPU cycles for little or no gain.

In the context of network data transport, "Should I compress?" is a common question.  But the answer can get complicated, depending on several factors.  The most important thing to remember is that compression can actually make your data move much slower, so it should not be used without some consideration.

When Compression Is Good

Compression algorithms try to identify large repeating patterns in a data set and replace them with smaller patterns.  Ideally, this shrinks the size of the data set.  For the purposes of network transport, having less data to move means it should take less time to move it.

Documents and files which consist mostly of plain text or machine executable code tend to compress well.  Examples include word processing documents, HTML files, some .exe files, and some database files.

Combining many small files into a single archive prior to network transfer can often result in faster speeds than transferring each file individually.  This may be true even if the individual files themselves are not compressible.  Many archiving utilities have options to pack files into an archive without compression, such as the "-0" option for "zip".  ExpeDat will combine the contents of a folder into a single data stream when you enable Streaming Folders.

When Compression Is Bad

Many data types are not compressible, because the repeating patterns have already been removed.  This includes most images, videos, songs, any data that is already compressed, or any data that has been encrypted.

Trying to compress data that is not compressible wastes CPU time.  When you are trying to move data at high speeds, that CPU time may be critical to feeding the network.  So by taking away processing time with worthless compression, you can actually end up moving your data much more slowly than if you had compression turned off.

If you are using a compression utility only for the purposes of combining many small files, check for options that disable compression.  For example, the "zip" command has a "-0" option which packages files into an archive without spending time trying to compress them.

Inline versus Offline

Many transport mechanisms allow you to apply compression algorithms to data as its being transferred.  This is convenient because the compression and decompression occur seamlessly without the user having to perform extra steps.  But it is also risky because any CPU time spent on compression is time NOT being spent on feeding data through the network.  If the network is very fast, the CPU is very slow, or the compression algorithm is unable to scale, having inline compression turned on may cause your data to move more slowly than if you turn compression off.  Inline compression can be slower than no compression even when the data is compressible!

If you are going to be transferring the same data set multiple times, it pays to compress it first using Zip or Tar-Gzip.  Then you can transfer the compressed archive without taking CPU cycles away from the network processing.  If you are planning to encrypt your data, make sure you compress it first, then encrypt second.

Hidden Compression

Devices in your network may be applying compression without you realizing it.  This becomes evident if the "speed" of the network seems to change for different data types.  If the network seems slow when you are transferring data that is already compressed, but fast when you are transferring uncompressed text files, then you can be pretty sure that something out there is making compression decisions for you.

Network compression devices can be helpful in that they take the compression burden away from the end-point CPUs.  But they can also create very inconsistent results since they will not work for all destinations and data types.  Network level compression can also run into the same CPU trade-offs discussed above, resulting in some files moving more slowly than they would if there was no compression.

If you are testing the speed of your network, try using data that is already compressed or encrypted to ensure consistent results.

Should I Turn On Inline Compression?

For compressed data, images, audio, video, or encrypted files: No.

For other types of data, test it both ways to see which is faster.

If the network is very fast (hundreds of megabits per second or faster), consider turning off inline compression and instead compress the data before you move it.