It is the fuel of modern business, every company needs a lot if it and few can afford enough of it.
Only a handful of companies have both the access to big data and the artificial intelligence (AI) and machine learning (ML) capabilities to turn it into a product. This has created an inequality chasm between the data-haves and data have-nots and an oligopoly in big data (among them FANG companies) where Zuckerberg, Gates, Bezos et al have become data barons not dissimilar to the oil barons of the Rockefeller and even modern-day Koch legacies.
The tale of two haves
While artificial intelligence is a voracious consumer of data and thrives on lots of it, the majority of companies either have a lot of latent data and no AI to use it, or the AI algorithms and not enough data.
Why is that? Primarily because finding people with AI qualifications, yet alone expertise, is extremely competitive and expensive and big tech is snapping up AI graduates straight out of uni with big initial salaries. And the sky’s the limit from there – Anthony Ledvandowski, for example, once Google’s head of self-driving AI, took home $120m in salary and incentives the year before he left the company.
Then, on the other hand, a lot of AI graduates who know how to write the algorithms are starting up their own tech companies but don’t have access to the volumes of data needed to grow their products.
Has FANG become the OPEC of data?
In the early days of the internet the webgraph (a map of how web pages are interlinked) was open to inquisitive developers like Larry Page and Sergey Brin to study, innovate and create companies like Google with. But the importance of the webgraph has diminished since then and been replaced by the social graph.
Around since the 60s, social graphs were once a simple drawing of the web of interpersonal connections a person had and the networking effect of those relationships.
However, with a community of over 2 billion, Facebook has redefined the social graph and become the most encompassing of all and created an exponential web of an individual’s connections to businesses, services, social activity, their likes dislikes and that of all their friends. And in the data economy, this is what businesses are after.
Although still important, the main source of value in the webgraph is computing a websites’ PageRank; but having the granular data of people’s online preferences and proclivities is the Holy Grail for digital advertisers.
Facebook’s social graph has arguably become the most lucrative ecosystem on the web and along with the other data oligarchs (Google, Amazon, Netflix, Microsoft) it has a data network effect that occurs when a product powered by machine learning becomes smarter with the more user data it’s given. This has created a winner takes all business environment that is becoming harder to compete in, especially for digital advertising.
In 2015, Google and Facebook raked in 40 percent of global digital ad spend. In the third quarter of 2016, Google and Facebook accounted for 99 percent of US digital ad revenue growth from the year earlier – the highest portion ever – and in 2017 they accounted for over 63 percent of total US digital ad spend. Although Amazon and Snapchat have slowed down their rate of growth, it will be difficult to break up this digital duopoly.
The Organization of Petroleum Exporting Countries, or OPEC, is an cartel-like organization of 14 oil producing countries that together control the flow of 44 percent of global oil production and account for the 73 percent of the world’s “proven” oil reserves that can influence the price of oil by flooding the market or cutting production.
While there is no evidence of cartelism among the data oligarchs, with such a disproportionate share of the market and an infinite supply of data, how are they influencing the price of global digital advertising?
Data moats, lakes and oceans
A moat is a term borrowed from Warren Buffett’s investing vernacular, a term that describes the economic defensive layer around a company – whether it’s their intellectual property (IP), brand name or its people – that gives it a competitive advantage over its rivals in the same industry. The hoarding of ‘data moats’ has become a form of IP defense among rival tech companies.
Data lakes are pools of raw data that all departments within an organization feed into. This is opposed to data that is siloed by department within an organization, each separate from the other and not shared across departments. So, an organization may have a huge amount of lucrative data on individual customers (their social habits, their buying habits, communication habits etc) but it is fragmented and would be difficult to compile a complete profile of them.
Lakes, the data term de jeur, hold a vast amount of raw data in its native format until it is needed. Unlike a hierarchical data warehouse that stores data in files or folders, a data lake uses a flat architecture to store data. So it remains largely unstructured, nonrelational (NoSQL) data until a specific question arises which the business can query with the data and then it will be refined into the relevant schema.
Data is big business; big data is mega-bucks business, but a planetary-sized database? Would that be the Holy Grail for advertisers and AI developers alike?
Towards a data ocean
Ocean Protocol is a project aiming to create a decentralized global data exchange where companies and individuals can buy and sell their data, through the native Ocean token, with the aim of unlocking big data and opening up AI development to more people. The belief behind the project is that a lack of trust in centralized databases is preventing data from being shared between competitors.
Co-founded by Trent McConaghy of BigchainDB, Ocean is part of an ecosystem of blockchain projects (see chart below) aiming to decentralize and democratize data, computing power and storage to close the gap on big tech.
In 2016, 16ZB (zettabytes, or 16,000,000,000,000,000,000,000 bytes) of data was created in the world, but only 1 percent of that was actually analyzed and, of that 1 percent, only a handful of companies had the means to optimize the data.
One of the end-goals of these projects is to give people back control and ownership of their data instead of trying to get the data barons to relinquish it.
Democratizing the future data economy
Where would decentralized data exchanges help?
Healthcare is one area where access to external datasets would have a profound impact. For instance, medical drug trials could have far less bias in their efficacy for certain demographics or genders if they were tested on larger datasets than what is available to a single hospital or lab.
A medical project using AI to gain insight or produce an effective product for a disease/condition would need 10,000 patients’ data to get a low error rate, which would be next to impossible for a single hospital. This is where a decentralized data marketplace would come into its element.
Keeping error rates low is the aim when trialling any software or algorithm, and the easiest and cheapest way of doing this is not to improve or write more complex algorithms but to run mountains of data on old algorithms, or as Trent McConaghy describes it: “replacing the Phds with CSV files – this led to a great reduction in error and more deployment of AI in the 2000s.”
Autonomous vehicles is another area where error rates must be close to 0 for us to have enough faith to put our lives literally in the hands of a computer. It is estimated that 500 billion to 1 trillion miles driven are needed to get AI models accurate enough for production deployment of self-driving cars which is prohibitive even for Toyota, which is working with Ocean. Separately, Avdex is a decentralized exchange for aviation data.
“Companies like Google and Facebook realized if they hoard that data to themselves they get a data network effect,” said McConaghy. “They have more data, which means better models, which means higher click-through rate that brings in more money. They call themselves AI companies but really they are data companies – they are data silos. And once you have data silos companies can disintermediate users from their own data, locking them out from their digital life. AI has catalyzed these data silos.”
What will the history books say?
Today the concentration of power in big tech has no parallel in the internet age; the antitrust concerns relating to Microsoft’s dominance with its Internet Explorer web browser in the 90s pales in comparison.
“I do believe right now Microsoft is probably on the right side of history,” says Microsoft CEO Satya Nadella, when compared to ad-driven models like Facebook or Google. “I always make the case, which is if it was not for Microsoft’s openness, the web wouldn’t have happened… Think about the current ecosystems and how closed and how walled-gardened and riddled with all kinds of ways that they’ve rigged it, [compared] to where we were.”
While we aren’t in an era of John D Rockefeller’s monopoly, concentration in the data market is somewhere between it and the OPEC model. And the best way to disrupt the oligopoly, as McConaghy puts it, is to “flood the data silos with an ocean of open data.”