Issue #7 | September–October 2015



The grand vision of “Big Data” is that it will lead to big insights on crucial connections between seemingly disparate phenomena. So far, though, that promise remains mostly unfulfilled. And it has been that way for quite a long time. When it comes to data, “all of us feel—and overeat—very much like the little boy who has been left alone in the candy store,” Peter Drucker wrote in 1969. “But what has to be done to make this cornucopia of data redound to information, let alone to knowledge?” What follows are three stories of organizations from across the public, nonprofit and private sectors, each of which provides at least a partial answer to Drucker’s question. Their insights are keen: Be opportunistic. Don’t forget to use your own brain. And the race is to the swift.


Everyone reading this article has access to limitless quantities of information. So, first, thanks for your time.


But, second, let’s all accept that our million-fold increases in data have not led to million-fold, or even tenfold, increases in wisdom.

The grand vision of “Big Data”—or whatever term you like for information sets that are so huge we can barely figure out how to store them—is that it will lead to big insights on crucial connections between seemingly disparate phenomena.

Perhaps the movements and food choices of thousands of tagged lobsters in Maine will turn out to predict the wine harvests of Bordeaux. Perhaps victory patterns in Yankees games will be the key to understanding the behavior of gamblers in Macao. Perhaps financial markets will turn out to have some underlying pattern we never gleaned before because we hadn’t viewed market fluctuations in the context of cell biology, rainfall or the Large Hadron Collider.

Whether such hopes for Big Data inspire yearning or fear, they remain unfulfilled.

We’re not the first to note this mismatch between data dreams and realities. When Peter Drucker first wrote of an “information explosion” that was taking place, the year was 1969. “All of us feel—and overeat—very much like the little boy who has been left alone in the candy store,” he wrote. “But what has to be done to make this cornucopia of data redound to information, let alone to knowledge?”

For many of the entities that claim to work with Big Data, the best answer to Drucker’s question might be awkward silence.


The Drucker Institute’s Phalana Tiller visits with Tim Leberecht, author of The Business Romantic, to talk about how businesses must balance hard data with real passion.


The New York City Department of Transportation’s iRideNYC app is built, in part, on data that the agency can obtain for little or no cost.

Organizations like to use the words “Big Data” to convey sophistication and dynamism. When you take a closer look, though, the ability to produce and store Big Data often seems more like the chef’s kitchen that was installed to allow for grand-scale entertaining. We plan to throw that party someday, but for now we’re more likely to be throwing Stouffer’s in the microwave. Or, to use a different analogy, we’ve got a river full of gold, but we’re still just learning how to pan it.

Many of the challenges to using Big Data effectively turn out to be similar to those of cross-pollinating ideas to spur innovation, a phenomenon that we examined in the last issue of MONDAY*. You can put professionally diverse people together in a room, but how do the insights from one field combine felicitously with those of another? And how, given all of the differences in the relevant professional languages (between, say, sociologist and physicist), do they even communicate?

Likewise, given the varieties in how information is collected and employed, how do you combine one data set with another and wind up with something useful? Consider a string quartet that gets recorded and released in three formats: on a vinyl record, on a CD and on iTunes. The relevant underlying data is the same—the sounds created by the four musicians—but the coding and the equipment needed to read it is in each case different. How do you mix and match?

Finally, how do you begin to come to grips with incomprehensible vastness? Perhaps placing a tracker on every car in the United States and collecting its location every 30 seconds for a year will, when combined with high math, lead you to insights about human movement patterns. Or perhaps it will just take up a lot of disk space.

This makes it all the more important to examine organizations leading the way in helping us to make sense of Big Data. We’re going to look at three—one in government, one in the nonprofit world and one in business—that are off to a notable start. Each provides a valuable lesson that should be applicable in any sector.



If you work for the National Security Agency, you probably have no shortage of resources at your disposal: money, storage capacity, personnel. But if you work for a municipal department, chances are that you’re going to have to figure out how to do clever things on a shoestring.

Few, if any, entities embody this spirit more than New York City’s Department of Transportation.

In 2008, when Cordell Schachter took over as chief technology officer of the DOT, budgets were lean and the iPhone was new. But technology evolved fast. Within three years, the city’s Metropolitan Transportation Authority began, like its counterparts in many other cities, to track every subway train and bus in its fleet, creating a record of every movement each minute of the day. This data feed was made available to the public.

For the city’s DOT, which is separate from the MTA, this was a tremendous gift of resources. “We try to be very opportunistic here and use available information,” says Schachter. “And in the MTA’s case, we could get it for free.”

DOT staffers also noticed that the company NYC Bike Share was tracking the availability of shared bicycles in the city, creating yet another data feed that they could tap at essentially no cost.

Combining these data streams, DOT staffers developed an application called iRideNYC, which allows the user to stand anywhere in New York and see the closest transportation options (bikeshare, bus, train), their times of arrival and departure, and estimated walking time.

A few rules have guided the department in its work on the iRideNYC application. The first is that it must work on all devices—iOS, Windows, Android, even a desktop computer. The second is that the code that went into iRideNYC should be made publicly available. “We are taxpayer-funded,” Schachter says, “so why should taxpayers pay again if another jurisdiction wants to use this?”



Nitin Baliga, director of the Institute for Systems Biology, oversees a team building predictive models based on reams of data. But in the end, he says, it’s all about having “human intuition kick in.”

The third guiding principle is that the application must focus on improving transportation access for those with physical or cognitive impairments, so that factors like pinpointing stairwell location and map brightness come into play.

IRideNYC is only one data-taming effort being undertaken by DOT. Another has been the streamlining of the permitting process for New Yorkers who need to excavate the street, under which lie countless feet of pipes, cables and tunnels. This requires no small number of permits: more than half a million per year. A decade ago, you had to file physical documents and pick up a paper permit to post at your job site. Today, the process is digital and can be completed on a mobile device.

Getting to this stage has required the creation and storage of a lot of data—on locations, on relevant restrictions, on project scope—that is now more easily searchable.

A major aspect of handling all of this information—and doing it efficiently and cost effectively—is knowing what to keep and how.


DOT receives lots of sensitive electronic data that must be kept secure. This gets stored in multiple city-owned physical sites that cannot be breached wirelessly. Less sensitive data—such as details of traffic speed, street construction, public space, pedestrian counts, ferry service, parking availability, bridge clearance, truck routes and traffic cameras, among other things—is likewise stored at those sites. But it’s also made available in the “cloud,” provided at relatively low cost by companies such as Amazon.

And some data, such as the feeds provided by the MTA, is generally allowed to flow by like a stream, rather than getting retained. This approach makes good sense: Most of us want to know that the D train is running five minutes late right now. Three months down the road, few of us will care that the train pulled into the station a bit behind schedule.

Schachter says his department has only just begun to explore data mining. In the future, he says, researchers may be able to anticipate something like pavement deterioration and calculate whether it makes the most sense to repave it, patch it or just leave it alone.

In the end, though, what makes NYC DOT an outstanding tamer of Big Data is that it looks for modest ways to make the most of the resources it has. With more and more products and services available commercially, “you can pay for a lot of things as you go,” Schachter says, which leaves funds for what he says pays off best: “investing in your team.”



When you enter the Institute for Systems Biology, just south of Lake Union in Seattle, it feels like you’ve wandered onto the set of The Big Bang Theory. White boards in hallway meeting areas are messy with equations and impromptu computations. Employees work among sundry toys such as hula hoops and Daleks, the robot from Doctor Who. The labs have chairs draped with white coats and sundry machines such as ion-trap mass spectrometers.

Launched in 2000 by three University of Washington scientists, today ISB employs 200 staff members from 45 academic fields.

While precise definitions of the term “systems biology” will differ, what is essential is the modeling of complex networks. This is done in the belief that many of the mysteries of biology will be cleared up by, in effect, stepping back rather than just pushing forward. The migration habits of a single bird will make sense in the larger context of weather, for instance.

I would say that an interaction between the model and the human brain has to make the prediction.

Senior Vice President and Director, Institute for Systems Biology


Nitin Baliga, who as senior vice president and director of ISB oversees most of the day-to-day operations, is one of nine faculty members on staff. His lab builds predictive models that will, it is hoped, be useful in combating disease, generating clean energy and protecting the environment.

Since 2000, ISB research has resulted in more than 1,300 papers published in academic journals. Most have titles—like “Genotoxic stress/p53-induced DNAJB9 inhibits the pro-apoptotic function of p53”—that are too abstruse for the lay person to understand.

To do this work, ISB has increased greatly both the data storage capacity and processing power it has available. But with this has come two big cautionary notes: For starters, it’s crucial not to get “wedded to the technology,” Baliga says. And it’s equally crucial not to get wedded to the mountains of information generated by the technology.

“For us, the model is the beginning, and the model essentially organizes complex data in a way that lets human intuition kick in,” Baliga says.

When data sets are massive, Baliga warns, it is particularly easy to be seduced by meaningless correlations. (Harvard Law student Tyler Vigen, who has made a side project of unearthing spurious links, has, for example, found a 99% correlation between U.S. spending on space, science and technology and suicides by hanging, strangling and suffocation.)

That’s why Baliga and his colleagues are always careful to take the data spit out by their computers and supplement it with other experiments, their own observations and a real-world gut check. In this way, taming Big Data is no different from the way that Drucker described the discipline of innovation: “Because innovation is both conceptual and perceptual, would-be innovators must … go out and look, ask and listen,” he wrote. “Successful innovators use both the left and right sides of their brains.”

Asserts Baliga: “Mathematicians and statisticians might feel that a model has to make a prediction. I would say that an interaction between the model and the human brain has to make the prediction.”


It’s one thing to look at a pile of data and try to make sense of it over time. It’s another to look at an abundance of incoming data and make sense of it right away. For many companies, the latter has become a business necessity.

Take Uber, the ride-share company that threatens to put traditional taxis out of business. Every minute of the day, Uber’s computers must process a torrent of incoming information: the location of every logged-in user in the world, the location of every one of its drivers, the destination of every passenger, the status of every ride, the payment of every fare, the rating of every driver. Most of these facts and figures must be processed immediately, so that Uber can—without pause—seamlessly provide more than a million rides per day.

One thing that makes such speed and fluidity possible is a system called Apache Kafka—a tool that offers organizations a way to take in a deluge of data, trillions of messages a day, and process it all right away.

Kafka’s creators have compared it to a central nervous system, the instrument by which the Big Data of our brains gets translated instantaneously into movement by our bodies.

Kafka came into being at LinkedIn, where engineer Jay Kreps and two colleagues, Neha Narkhede and Jun Rao, increasingly found themselves resorting to patchwork and improvisation in their efforts to gain instant control of the information constantly pouring in to the professional social network.

LinkedIn already had some Big Data capabilities, and, like many companies, was making use of a software framework called Hadoop. But this only got Kreps and his teammates so far. What they needed was a means by which to process data as it came in, not just in batches after the fact.

To take an analogy, Hadoop could tell you amazing things about a baseball game once it was over—the average speed of every pitch that was thrown, the precise distance of the home runs that were hit, the way that a shift in the infield thwarted what would have been a game-winning double. But in the middle of the fifth inning it couldn’t actually tell you who was winning.

The transition from periodic once-a-day computation to something that happens all the time is a tectonic shift.

Co-Founder and CEO, Confluent

So Kreps, Narkhede and Rao came up with Kafka, a platform that allowed LinkedIn to handle vast feeds, making data immediately responsive to users when they entered in information—whether it was to change their status or to connect with others. Not long after, in 2011, LinkedIn “open-sourced” Kafka, meaning that anyone could access the code and offer improvements to it or otherwise adapt it to their own business needs.

For the next several years, Kreps and his co-workers continued at LinkedIn, but they also found themselves in high demand as the open-source stewards of Kafka. “We started getting a lot of requests for help from companies trying to adopt it in a large way,” Narkhede recalls.

Among those Kafka enthusiasts were Twitter, Netflix, Square, Spotify, Pinterest and Uber. So in 2014, Kreps and his team decided to make a business of their own, founding Confluent, a company that helps other organizations use Kafka effectively. If Kafka is the internal combustion engine that anyone can acquire (and for free, no less) Confluent is the outfit that helps build you the rest of the car around the engine (for fees that it won’t discuss publicly).

In July, Confluent, now a 25-person team based in Palo Alto, Calif., raised $24 million in a new round of venture funding.

“The transition from periodic once-a-day computation to something that happens all the time is a tectonic shift,” Kreps says.

Indeed, as Confluent grows, it is finding that all sorts of organizations—in retail, telecommunications, healthcare and financial services—are discovering one of the secrets to taming Big Data: Not only do you have to use it smartly, you have to do so in a flash.*

Tweet this article

Monday Mandate*

What will you do on Monday that’s different?


Have each member of your team review all of the data that he or she receives on a regular basis—and assess how useful it is in actually helping to make good decisions. If it’s not useful, fix it. Or kill it.


Traditionally, organizations have had multiple layers whose main function was to coordinate the passing of information back and forth through the enterprise. With more data readily available to everyone, ask yourself: Where are there opportunities to streamline?


“An adequate information system,” Peter Drucker wrote, must lead executives “to ask the right questions, not just feed them the information they expect. That presupposes first that executives know what information they need.” Do you?