Cornell Bowers College of Computing and Information Science
Hakim Weatherspoon

Story

CS Prof Hakim Weatherspoon: Laying the Foundation of Modern Cloud Computing



-- by Lauren Cahoon Roberts for the Cornell Research Website

Just a decade ago, if you told someone you stored your files in the cloud, they would have given you a blank stare. Now, cloud computing has become an integral part of our web-linked lives. The leap from storing our data on specific, physical machines to distributed, cloud-based systems has been possible in no small part to Hakim Weatherspoon, Computer Science. His early and continuing work in network and computing science has helped lay the foundation of modern cloud usage, and he aims to make this capability even more secure, efficient, and powerful in the future.

A Precursor to the Cloud

Weatherspoon had originally planned to work in industry at Intel or Microsoft.  However, after excelling computer engineering at the University of Washington while also playing as a defensive back on the college football team, Weatherspoon was urged to apply for the illustrious Rhodes Scholarship. He ultimately became a Rhodes finalist—an experience that became the impetus for pursuing a PhD in computer science at the University of California, Berkeley.  

“I worked on a large-scaled distributed storage system called OceanStore,” says Weatherspoon. “It was going to store the world’s information, forever--and my marching orders were to figure out how to do this--no one had looked at this at the time.  It was the precursor to cloud computing.”

He focused on the algorithms, theories, and mechanisms of how this ocean of data was replicated, monitored, and double-checked and was able to advance the state-of-the-art of this early ‘big data’ computing problem. The work was so foundational, Amazon (dynamo) referenced some of his algorithms when they began building their cloud computing systems.

From One-Vendor System to SuperCloud

Weatherspoon is still improving and expanding the cloud’s capabilities. “The key questions I’m trying to answer is how to make computing and storage very efficient,” he says. “My research lies at the intersection of the networks, storage, and computation required for cloud computing and contributes to the fundamentals of distributed systems that underlie the cloud.”

One of the main issues with cloud data storage that Weatherspoon wants to overcome is known as vendor lock-in. Anyone who uses a cloud storage service enters into an agreement with one vendor, typically Google or Amazon. Once they sign up with one vendor, they’re stuck with them. Data cannot be easily transferred from one provider to another. To break free of this one-vendor system, Weatherspoon is creating a SuperCloud, an umbrella-like system that provides a “shim layer” of storage over and beyond the individual cloud systems. “My research creates new models of computation and data that can migrate seamlessly between non-cooperating cloud infrastructure providers,” he says.  “This gives the cloud user an unprecedented level of control and protection, and they can use Google, Amazon, Microsoft cloud services interchangeably.”

This new technology could help businesses that are concerned about the idea of having all their data stored with only one company.  “It would change how we store and compute data,” says Weatherspoon. “It would make cloud storage less of a privately-provided service and more of a utility.”

SoNIC, a Software-defined Network Interface Card

Another key project is SoNIC, or a Software-defined Network Interface Card, which allows researchers to closely examine the high-speed networks that interconnect data centers. “We can actually observe the transmission of bits from one machine to another, and it turns out that there are some physical-layer properties that affect performance--namely, ‘burstiness’,” says Weatherspoon. “If I have an application I’m sending over the network, that transmission is broken up into packets--there will be bursts of packets, and then lulls--which can reduce the overall performance. Some of that is due to the physical layer. SoNIC helps optimize that process by allowing us to control every bit on those physical wires.”

This level of control enables Weatherspoon’s team to perfectly synchronize different machines within a network. “Synchronization is actually a huge issue for distributed systems,” he says. “When you have two machines that are able to have the exact same time stamp on an event, this accuracy helps the network better order events, and increases performance of applications.”

While using SoNIC to optimize this physical transmission, Weatherspoon and his team also found that they could create covert channels using these intermittent packet bursts. “We found we could change the spacing between packets—like Morse code,” he says. “Thus, we can encode or hide information using this system.  It’s invisible to everyone except those sending and receiving the covert channel. We’ve proven it’s faster and more efficient than the current state of the art.” This secret messaging system has attracted the attention of the U.S. Department of Defense, Air Force, and many academic institutions. They’ve joined a group already occupied by industry giants, like Amazon, Microsoft, and Google, of major players that have taken note and benefited from Weatherspoon’s foundational work.

Ultimately, it’s the everyday cloud users who Weatherspoon wants to help most. “At this point, a lot of people know about the cloud in general—they know their pictures are stored on it,” says Weatherspoon, “and I’m focused on ensuring that everyone continues to have secure, reliable access to their data on that cloud, so they don’t ever lose access and control of their critical and noncritical data.”