Data Expedition, Inc.


Articles, events, announcements, and blogs

From Silicon To Memes: Part 4

by Seth Noble |  Blog Oct 16, 2019

This is the final part of a four-part series published on which explores how automated arithmetic machines have come to dominate human communication and the consequences for our culture and economy.

A New Data Transport Model

Earlier, we discussed the evolution of electronic communication and how a component that has been essentially unchanged since the 1970s is costing the economy hundreds of billions of dollars annually. We looked at the design assumptions of TCP, the software regulating the flow of nearly all data on the internet and how it has failed to keep up with the past 45 years of internet growth. Now, we'll take inspiration from TCP's flaws to explore how a modern data model can solve this problem.

The key component of TCP that limits it in today's network is its virtual-circuit data model. There are also assumptions that data must always be delivered in order, be able to flow equally in both directions and that congestion will rarely occur. But the internet is a best-effort packet-switched network, and it is inherently chaotic. Embracing that uncertainty rather than trying to abstract it away with a virtual circuit provides the flexibility and visibility for software to make intelligent, real-time decisions needed to ensure full and efficient utilization of network hardware.

Ordered delivery of data is expensive, it requires buffering and it can stall as TCP tries to juggle incoming packets. Because modern storage is fast and plentiful, and the most critical data transfers do not require ordering, it is possible to deliver network data directly to files or primary memory. This alleviates hangs and speed oscillations often associated with the internet. Smoother, more responsive flow control benefits the entire network, reducing congestion and recovering bandwidth from lost and delayed packets. Some data flows, like live video, do require ordered delivery and buffering, so that capability must be available, just not mandatory.

A great deal of network overhead, the time and bandwidth consumed shepherding the actual data, can be eliminated by recognizing that most network communication is transactional -- a small request pulling an arbitrary amount of data. For example, rather than the back-and-forth negotiation of TCP, simple yet secure transactions can start with just one packet sent and one returned. Given a sufficiently light-weight implementation, more complex operations can be created by combining simple transactions as needed. Such modularity allows overhead to scale with the needs of each application, incurring only the minimum overhead required.

The problem of congestion control, figuring out how fast to send data on a best-effort network, is inherently difficult. All you know about a path is when you sent a packet and when you received (or didn't receive) a reply. Everything else -- speed, latency, packet loss -- must be inferred from those simple events. But once freed from the constraints of a virtual circuit, the problem becomes a lot less difficult. Focusing on just the arrival of data at a single point allows decisions about the network to be made without waiting for control data to cross that network. This independence means more timely, accurate adaptations to changing conditions.

Developing such novel transport algorithms is not easy, but it has been done. The looming challenge is deployment. Consider the state of IPv6, the "next generation" of the internet protocol. Despite more than 18 years of production and a nominal launch date of 2012, fewer than 30% of internet users had adopted it. Network infrastructure change is slow. Fortunately, it is not necessary for this.

In 1980, David Reed designed the user datagram protocol (UDP) so applications could create their own transport protocols without having to modify the TCP or IP layers. Since then, several open source and commercial efforts have built on top of UDP for greater efficiency. Nearly all have been constrained by the same virtual-circuit model as TCP, limiting their usefulness to well-funded enterprise networks and dedicated hobbyists. But the freedom UDP gives to applications provides a path for deployment of completely new data models using existing infrastructure.

That still leaves the challenge that any new communication protocol requires software on both ends. Enterprise IT departments can have a mass-adoption of such applications and technologies. So can a few technology companies with global reach.

Apple, Microsoft and Alphabet (Google) each have vast ecosystems of devices. How much snappier would iPhones be if iCloud could be accessed with full network hardware utilization? What could Microsoft build if all its servers and desktops could communicate at a high speed? How many more Google ads could be displayed if every Android device was querying Google's servers with a more efficient protocol? Google has deployed its own UDP-based QUIC protocol for exactly this purpose, though its reported speed gains -- about 5% -- are modest at best.

Beyond the titans of computing, there are many other companies that would stand to benefit from improved data transport efficiency.

Cloud vendors such as Amazon Web Services or Oracle are entirely dependent on the ability of their customers to efficiently move data in and out of their systems. While lacking the global reach of consumer device-makers, the stakes are much higher for companies whose revenues are directly proportional to the data they process. Furthermore, companies bringing new communication experiences to consumers are also in need of every competitive advantage. Facebook, Snapchat, Instagram, Twitter and dozens of upstarts will live or die by their ability to provide a better user experience.

Even if just one of these companies adopted a more efficient transport protocol, and even if it only made a 1% improvement in its revenue, the sheer scale would still represent billions of dollars in economic gain.

Decades of innovation have grown a worldwide economy that is completely dependent on real-time data transfer across a global packet-switched network. Yet nearly all that communication is funneled through a 45-year-old virtual-circuit data model. Adoption of a more efficient transport protocol can be achieved on a per-application basis, without the need to deploy new standards or consensus. Such technology exists now, and a company with global reach could reap these rewards unilaterally or share them at their choosing.