Data Expedition, Inc.
Articles, events, announcements, and blogs
File transfer always begins and ends with storage. So, when we look at how fast we can move data across a network, the storage devices at each end are a critical part of the performance equation. Among our customers, storage is the most common cause of unexpected speed limitations, occurring more often than any network issue.
Here are five common myths about storage speed and why they are wrong:
1. Storage Speed Can Be Measured in Gigabits per Second
Storage devices will often state their capabilities using simple metrics like the number of gigabits or gigabytes per second that they can read or write. But the reality is much more complex.
Those numbers are often based on the signaling rate of the device's hardware interface, not the physical storage media. For example, a USB 3.1 hard disk drive (HDD) might claim to be capable of data transfer speeds up to 10 Gbps. But that is the speed of the USB cable, which can only be achieved while reading or writing to the device's cache. The actual write speed of a 7200 rpm hard disk drive might top out around 1.28 gigabits per second, and even that is only under ideal conditions.
Hard disk drives are drastically slowed by trying to read and write multiple files at a time. Performance metrics like seek time and spindle rotation, combined with environmental conditions such as temperature and vibration, and filesystem conditions such as available space, can easily drop performance to just hundreds of megabits per second. That's around 1% of the interface speed.
Solid state drives (SSD) are much simpler and generally much faster than hard disk drives. SSDs are also less sensitive to variations in access patterns and environmental conditions. But the quality of SSDs varies dramatically and their ratings are just as often skewed toward "ideal" conditions.
Tech Note 0023 has detailed recommendations for storage hardware.
2. Storage Arrays are Fast
A common approach for improving storage performance is to arrange multiple drives into a RAID configuration. RAID is an acronym for Redundant Array of Inexpensive Disks, which should give you a clue about its purpose and function. A RAID is primarily about redundancy and cost savings. There are many different ways to setup a RAID and they all involve trade-offs.
To achieve high performance with one type of workflow, requires sacrificing performance for others. For example, RAID 1 can be very fast for reading large files, but is very slow for writing. RAID 4 can be faster for small I/O operations, but slower for sustained access. It is critical to understand exactly how an array will be accessed and then configure it to match current and future needs.
Tech Note 0018 has some more guidance on how to get the most out of a RAID.
3. Fast Storage Hardware Means Fast Data Access
Storage does not operate alone. At a minimum, it requires cables, interfaces, and a computer to access the data. Network attached storage (NAS) adds switches and perhaps routers to the mix. On top of all the hardware is the software that runs it, including the operating system that mounts the filesystem. Just as the storage must be chosen and configured to match your workflow, all of these other components must be carefully matched.
One of the most overlooked components is the operating system of the computer that is mounting the storage volume. Whether this is the final consumer of the files or will serve those files out via other protocols, the operating system and its setup has a huge impact on the storage throughput. So do any other applications or services which may be quietly running alongside your main workflow.
Unix derived systems such as Linux, FreeBSD, and macOS are capable of providing superior performance for all workflows. But they can require some tuning. For example, Linux tends to over-buffer write operations, which can result in devastating storage hangs when data arrives too quickly. Tech Note 0035 offers guidance on tuning Linux for maximum filesystem speed.
4. Network Attached Storage is Fast
It does not matter how fast your storage device is if you put a slow protocol in front of it. SMB, NFS, AFP: whichever network protocol you choose it will choke performance compared to a direct hardware connection or even a Storage Area Network (SAN) such as fibre channel. Using high speed networking such as 10 gigabit per second ethernet cannot overcome the limitations of these protocols.
It helps to use the latest version of a NAS protocol and to carefully tune both the NAS client and NAS server for performance. For example, SMB speed can be improved by disabling packet signing features not needed on a LAN and NFS performance can be improved by disabling write synchronization. But any time NAS is involved, you must lower your expectations.
Tech Note 0029 has some more guidance on how to get the most out of NAS.
5. Virtual File Systems Are Just as Good as the Real Thing
Filesystem containerization or emulation will cripple performance. Whether it is a virtual machine (VM) hard drive, an S3 filesystem emulator, or a disk image, adding layers to storage will have profound effects on its speed and reliability. Even just a mismatch between the filesystem format and the media, such as FAT32 on an SSD, can cause problems.
Make sure that your storage is formatted using a filesystem appropriate to the media, and access that filesystem as directly as possible. If your workflow seems to require extra layers on top of your storage, reconsider the type of storage you are using. Choosing storage for cost savings almost always means sacrificing performance.
What to do about it
With all these hidden performance traps, how do you know whether your storage will hold up? Testing is key. No matter what the vendor specs say, test the storage and related components together using a realistic workflow and load level. Even if an ideal system is out of reach of your budget, understanding how the storage you have will react to different workflows can help you tune it for better results.
ExpeDat trials are available for free and can be helpful in pushing your storage to its limits and its logging can often identify storage problems. See Tech Note 0033 for instructions on enabling diagnostic logging.