Pages

March 06, 2013

Google's systems send out alerts when they are down to their last few petabytes

One of the best-kept secrets of Google’s rapid evolution into the most dominant force on the web is a software called Borg. Google has been using the system for a good nine or 10 years and John Wilkes and his team are now building a new version of the tool, codenamed Omega.

Borg is a way of efficiently parceling work across Google’s vast fleet of computer servers, and according to Wilkes, the system is so effective, it has probably saved Google the cost of building an extra data center. Yes, an entire data center. That may seem like something from another world — and in a way, it is — but the new-age hardware and software that Google builds to run its enormous online empire usually trickles down to the rest of the web. And Borg is no exception.

Google's systems are big. Google engineers might receive an emergency alert because a system that stores data is down to its last few petabytes of space. In other words, billions of megabytes can flood a fleet of Google machines in a matter of hours.

Google’s system provides a central brain for controlling tasks across the company’s data centers. Rather than building a separate cluster of servers for each software system — one for Google Search, one for Gmail, one for Google Maps, etc. — Google can erect a cluster that does several different types of work at the same time. All this work is divided into tiny tasks, and Borg sends these tasks wherever it can find free computing resources, such as processing power or computer memory or storage space.

Wilkes says it’s like taking a massive pile of wooden blocks — blocks of all different shapes and sizes — and finding a way to pack all those blocks into buckets. The blocks are the computer tasks. And the buckets are the servers. The trick is to make sure you never waste any of the extra space in the buckets.

“If you just throw the blocks in the buckets, you’ll either have a lot of building blocks left over — because they didn’t fit very well — or you’ll have a bunch of buckets that are full and a bunch that are empty, and that’s wasteful,” Wilkes says.


Rather than run separate software systems on separate server clusters, Google can run everything on one cluster — thanks to Borg and its successor, Omega. Illustration: Ross Patton




At UC Berkeley, Ben Hindman’s aim was to spread computing tasks across these chips as efficiently as possible. Intel would send him chips. He would wire them together, creating machines that spanned 64 or even 128 cores. And then he worked to build a system that could take multiple software applications and run them evenly across all those cores, sending each task wherever it could locate free processing power.

At UC Berkeley, Ben Hindman’s aim was to spread computing tasks across these chips as efficiently as possible. Intel would send him chips. He would wire them together, creating machines that spanned 64 or even 128 cores. And then he worked to build a system that could take multiple software applications and run them evenly across all those cores, sending each task wherever it could locate free processing power.

In March 2010, about a year into the Mesos project, Hindman and his Berkeley colleagues gave a talk at Twitter. Mesos seemed like the perfect way to rebuild Google's Borg.

Google’s new version of Borg — Omega, which Wilkes has publicly discussed — is even closer to the Mesos model.

These are known as “server cluster management systems,” following in the footsteps of similar tools built in years past to run supercomputers and services like the Sun Grid Engine. Both Omega and Mesos let you run multiple distributed systems atop the same cluster of servers. Rather than run one cluster for Hadoop and one for Storm — a tool for process massive streams of data in real-time — you can move them both onto one collection of machines. “This is the way to go,” Wilkes says. “It can increase efficiency — which is why we do it.”



If you liked this article, please give it a quick review on ycombinator or StumbleUpon. Thanks
blog comments powered by Disqus