Not only Privacy is at Stake. How Big Data and Advanced Data Analytics may Deepen Social Inequalities.

The Cambridge Analytica scandal sharpened our awareness of how third parties can access information about us without adequate consent. The discovery of discriminatory patterns that emerged in the use of internet platforms such as Google's search engine, Uber or Airbnb called to attention the problem of internet discrimination. The EU General Data Protection Regulation (GDPR) that takes effect today, reinforces privacy and data protection rights of individuals and also addresses algorithmic discrimination. Much is at stake in this respect, but perhaps there is a more insidious problem looming, which is not fully addressed by the GDPR, namely the deepening of social inequalities that could occur when business enterprises benefit from advanced data analytics as applied to big data.

Consider the following thought experiment: A person who has successfully overcome cancer is statistically more likely to suffer relapse than someone, who has not had the same illness. Without legal restrictions, a rationally behaving insurance company would ask the person with the relapse risk higher health insurance premiums than the average person. Such a disadvantage could then spill over to that person’s credit rating, because a person with high insurance premiums and possibly lower life expectancy might be required to contract higher credit costs. This disadvantage may draw further disadvantage, for example with regard to the person's chances to find a stable job or become a business partner. This again, could diminish the person's prospect to obtain appropriate health insurance in the first place. Now the problem is that in the absence of effective legal safeguards we could hardly blame any of the individual enterprises, that base business decisions on statistical insight. However, the cumulative effect of their decisions can become highly detrimental and utterly unjust for individuals.

Why do big data and advanced data analytics potentially aggravate the above discussed negative feedback loop?

What many people do not take into account is that large data volumes as such have very little economic value for business. The market value of data essentially depends on two factors. The first is the rarity of the data. The needle (and not the hay) in the haystack is what business values most. The second- and related factor is the data's "granularity". Granularity refers to the size in which data fields are sub-divided. The more "granular" data is, the more and better the data scientist can aggregate and disaggregate it to meet the needs of different demands from business customers. Their demands typically concern more effective advertisement, optimization of prices, or making predictions of individual behavior, for example with regard to employment or ability to repay loans. In all of these cases, big data applications will be commercially most interesting for a business, if they are based on data sets with rare data about people, which could be analysed on the lowest (most granular) levels. With such data sets, individuals are clustered and classified into very precise groups. Business transactions and offers will be based on these classifications. Most often, people have no idea in what kind of boxes they are placed, and what kind of judgment is made about them.  

Advanced data analytics’ method of clustering and then classifying individuals into groups for statistical treatment could – when subsequently used to make business decisions – not only create accumulated advantage for certain people. Inversely, it could result in accumulated disadvantage for individuals who, somewhere in the data process, are placed in groups that induce less advantageous opportunities (for example with regard to work, business, or social opportunities). With the advent of big data and data analytics, people with initial advantages (for example having good health, being wealthy, etc) could beget more advantage whereas initial disadvantages tend to become reinforced, deepening the initial distance. With reference to the parable of the talents in the biblical Gospel of Matthew[1], we refer to this problem as big data’s 'Matthew Effect'.

By limiting data transfers from one context to the other, the GDPR's safeguards at least indirectly reduce the risk of negative feedback loops. However, the GDPR does not specifically address the problem of social inequalities. Nevertheless, all businesses involved in the big data "supply chain" (data collectors, data aggregators, trackers and finally all data users) should coordinate and possibly restrict their activity in order to identify and counteract detrimental Matthew Effects. This becomes the more urgent, when legal or regulatory responses to the problem are either absent or inadequate. 


[1] “For to every one who has will more be given, and he will have abundance; but from him who has not, even what he has will be taken away.” — Matthew 25:29, RSV. The term “Matthew Effect” was first used by Robert Merton in 1968, when he described how reward systems in the scientific community tend to provide recognition to eminent scientists for their contributions, withholding such recognition from scientists with little repute.