The world’s next great natural resource race will not be the traditional mad dash to mine, extract, and commercialize oil, gas, timber, minerals, or even water. Instead, it is likely to take place in a cooled server room devoid of any heavy machinery or work crews. The prize: the ones and zeros that increasingly comprise the fabric of contemporary society – your data. A race is on to lay claim to, protect, and ultimately monetize the ever-growing treasure trove of information describing the way in which we work, shop, live, and love.
Much like the stylized villain-hero relationship between loggers and tree-hugging environmentalists, this new resource race pits two camps against each other. On the one hand are the companies – typically larger, well-resourced technology and data mining firms – who see a vast return on investment from vacuuming in as much raw data as they can, applying algorithms to that data to make some sense from the terabytes of randomness, and then selling the intelligible results to advertisers and corporate planners. The venerable IBM, which has been in business longer than ExxonMobil, has pivoted in the past decade to focus much of its strategy and growth plans on “Big Data” and the value it can provide to clients by applying sophisticated data analysis to mountains of raw information. Apparently that sophistication sells; IBM’s Smarter Planet initiative, for example, is helping governments and firms mine the “digital exhaust” produced by their citizens and customers (their online and digital habits) for valuable insights that can improve systems efficiency and hone marketing and advertising campaigns.
Opposite the large Big Data firms are a growing crop of “open data” activists. Typically smaller, younger, and more poorly resourced, this motley crew of individual hackers, small and medium-sized open source software firms, and non-governmental organizations sees the commercial control of data as a major impediment to innovation and creativity. They view attempts to monetize user data as crass profit grabs that encourage technology solutions to be proprietary and expensive rather than cheap and accessible. Instead, they promote the idea that Big Data sets should be freely available to the public at large, and that clever entrepreneurs will find innovative ways of providing value-added services or features on top of otherwise “liberated” information. Government agencies around the world, from the United States and United Kingdom to Kenya and Moldova, have begun to get into the act by providing mountains of raw government data for free online.
The contrast boils down to a simple dichotomy. Who owns and controls your data – you or the company that harvested it from you?
While this tension has been simmering for some time, it erupted in early-2012 when the World Bank announced a partnership with Google to provide access to Google Map Maker data to developing country governments for disaster relief and humanitarian purposes. Map Maker is a tool that allows users around the world to submit information to Google Maps (“There’s a health clinic on this street corner!”) and potentially have that data point published as part of Google Maps itself. While the deal seemed innocuous enough, the press releases from both sides were unclear as to whether the Bank was promoting the use of the Map Maker tool itself in developing countries for “mapping parties” and other community-based mapping efforts to help put unmapped towns and cities literally on the map.
This got the open data camp worried (full disclosure: this author included). Map Maker’s terms of service explicitly describe how Google forever controls the data submitted by users through Map Maker. It is a classic Big Data play: vacuum in mundane data at massive scale, mine it for valuable insights, and sell it. So if you click and submit that clinic location via Map Maker hoping that it shows up in Google Maps, that information is Google’s, not yours, to control. If enough of us submit the locations of heretofore unmapped clinics in a town, you can imagine that drug sales reps from pharmaceutical companies might be very happy to pay for “premium” access to that cleaned data showing exactly where they should target their next sales calls. (The World Bank ultimately clarified its position, saying it would not endorse the use of any software that locked up user-submitted data in the hands of someone other than the submitter.)
The mapping controversy is just the tip of the iceberg. This tension between commercialized Big Data and open data is playing out across a range of sectors.
In intellectual property, the technologists and activists behind the increasingly popular Creative Commons licensing scheme are pushing to break down traditional copyright barriers by promoting ways to freely encourage the use and reuse of published material (especially digital publications and data sets) with little to no compensation beyond attribution. This flies directly in the face of traditional copyright ideology, which seeks to protect, control, and monetize access to previously published content.
Email and social media are other terrains where the battle is raging. All of the major free email providers – think Hotmail, Yahoo!, Gmail – sell ads against your personal data right next to your inbox. Remember the hue and cry that erupted when Google first launched Gmail and mentioned, casually, that their algorithms would be sifting through your messages in order to determine what ad to best display? Neither do I. The controversy died down quickly after tens of millions of users migrated to Gmail because of its stability and ease of use, and it didn’t take long for the other web-based email providers to take notice and adopt their own ad schemes.
Facebook recently become a multi-billion dollar company thanks entirely to your willingness to provide, for free, incredibly valuable data that it can then mine and sell to major advertisers. In exchange for free access to the Facebook tools that allow you to easily share that video of your niece signing Justin Bieber’s “Baby” at a family picnic, your sister’s Like of that video (and the tens of millions of other Likes of Bieber-related content that same day) is what gives Facebook the ability to sell targeted ad space to Walmart promoting the singer’s hit new single.
It’s difficult to see the data race subsiding. In fact, the economic incentives to harvest and monetize vast amounts of data are only growing. A 2011 McKinsey study (that is quickly becoming the most often cited source of the economic potential of Big Data) pegged the value of Big Data in just the US healthcare system alone at US$300 billion. McKinsey also estimated a need for a million and a half “data savvy managers” in the US simply to take advantage of the economic opportunities of commercial Big Data. The report notes that there are big potential wins for the public at large as well: a more efficient US healthcare system that leans more heavily on Big Data to anticipate public health trends could lower heath care costs for all taxpayers. But such win-win scenarios are not always as obvious in other sectors.
The competing vision for the future is the one in which open data dominates. As more open source projects and services come to market, the digital world becomes a less expensive place. Instead of paying a web design company thousands of dollars to publish your organization’s website, you can now use the free and open source Drupal or WordPress platforms to quickly and easily do the same work yourself. The volunteers behind those open source projects earn a living by helping out with custom projects that require a higher degree of sophistication. A great example of this is the company Red Hat, which sells highly tailored custom software solutions to companies and governments based on the completely free, open source, and ubiquitous Linux operating system. That cool new Android phone you’re using? It runs on Linux too.
In all likelihood, the future will not be a pure dystopian outcome where either a) big, evil commercial data warehouses control your every byte, selling it back to you at a premium, or b) we all get free iPhones forever. Certain sectors are headed in one direction or the other – publishing towards a more open data future, consumer goods towards a commercial Big Data paradigm – but the outcomes remain uncertain. It’s early days, and the major players – Google, IBM, Facebook, open data activists, and government agencies – are making their play.
The race is on.
— Nathaniel Heller
— Image Credit: luckey_sun
Nathaniely, I absolutely agree with you.
However, I would love to have some listing of the open source projects which deal with open data. Trying to have some understanding of how they intend to do it.
Will appreciate this listing.
As the WEF put it: personal data will be the next ‘oil’.