People and Data

The Benefits, Costs, and Risks for People

As noted above, the main benefits from the data revolution arise from the information value that personal data can provide to either individuals or to organizations that serve those individuals and from the financial value that it has for organizations. Yet costs and risks to individuals exist in the era of expanding collection, flow, and use of personal data. These include, as noted, privacy, loss of agency or control, and risk of exclusion from benefiting from data's value. This section notes that people are not always aware of the costs or the benefits of their participation in the data marketplace and, even if aware, might be constrained in their ability to improve the tradeoff due to the structure of the market.


Benefits

The data revolution has given more people access to information they can use to make better decisions. This is, first, because people can use data for its informational value – either directly or through the organizations that serve them – exposing them to new information or by creating new services or products, both of which help them make or realize better decisions. Second, it is because their personal data has financial value, implicit or explicit, that allows them to exchange (sell or barter) personal data for services or products they might otherwise have had to pay for. Often this includes a range of sophisticated online tools that allow them to be better informed about services or reduce transaction costs (including information sources such as search engines or communication tools such as email services). Table 4.2 summarizes the two forms of value and provides examples of how they operate in the data market.

Table 4.2 Benefits from personal data to individual

Data holds Informational value Informational value
Information is derived from the data people
produce, which could inform decision-making.
People produce data that has financial value for some other party and
exchange their data for products or services.
Effects Direct: Derived when people use their own
or others' data to make decisions (such as
exercise data from a wearable activity tracker
or reviews on a shopping portal).
Direct: Derived when people share their data (knowingly or otherwise)
with organizations in return for services (for example, people provide data
in return for access to information services or social networks online);
those services are financed through the sale of the data or its derivates.
Indirect: People's data goes to organizations
(for example, health care companies,
urban planners, financial institutions, news
organizations) that use it to improve or
subsidize their products.
Indirect: People provide data that collectors use or sell on to others,
generating economic value that could return to individuals through
ower prices or income-generating opportunities, or feed into broader
economic processes, which could also include innovations that benefit
the wider public.
Benefits

  • Better decisions

  • Innovative products

  • Improvement in public services

  • Access to digital services

  • Wider economic benefits for data users that could spill over into opportunities for data producers

Benefits due to informational value

When data is organized and analyzed it creates information, which can be an essential input in economic decision-making and security; it influences resource allocation, choices about technologies, and political choices and informs them about the markets that they participate in. When farmers have access to market pricing information, they can make better choices about when and where to sell their produce. Similarly, when consumers have better information about the supply, quality, and price of goods or services, they can make better choices about where and when to buy them. When data from weather monitoring systems feeds into complex models and informs governments, businesses, and individuals about potential inclement weather, it allows each to take measures to minimize or respond to damage. When civilians have better information on the events taking place around them and the decisions their political representatives are making, they can make better decisions about where to live, how to get around, how to spend leisure time, and how to vote. And when young people have better information about careers and wages, they can make better choices about what they study.

The data revolution is giving more people increasingly diversified and context-specific information through improvements in data collection, processing, analysis, and distribution, online and offline.8 Thus far, people have typically benefited indirectly from data, as when organizations that collect, process, and use data to make decisions or inferences about people's demands or interests then provide new or better information or expand the set of opportunities available to individuals. For instance – continuing from a previous example – this happens when governments improve disaster preparedness or response or insurers process claims faster. And when people share their personal data with many of today's online services providers, those companies can attract advertisers, giving more people access to many sophisticated digital tools, from financial planners to cloud-based storage, often free.

Organizations can use personal data for innovation in processes, products, and services. These innovations could lead to economic benefits for people through lower prices and a better match between products and consumer needs. For example, TrueCar collects and analyzes individual transaction data to provide an idea of local vehicle-specific prices so that car buyers know what others have paid for the same car. And various companies are using personal data to design more engaging or useful products and services. In health care, data collected from large groups of individuals is improving diagnoses and helping to identify treatment options. 

Personal data is being used to improve public service delivery, enhance policy making, strengthen citizen participation, and enhance security. For instance, New York City is planning to use data from devices installed in taxis that use GPS, as well as pick-up and drop-off data from ride-sharing apps, to improve traffic management, identify roads that need to be fixed, and determine where to focus efforts after inclement weather. Similarly, in Seoul, the capital of the Republic of Korea, the location of mobile calls and text messages is used to optimize night bus routes. 

One popular application gaining use around the world is the use of locational information from smartphones to report problems with local services. Not only does this pinpoint the exact location of annoyances, such as uncollected garbage, potholes, or graffiti, it can also help foster citizen engagement. Social media activity can be "scraped" to alert vulnerable populations, such as informing Brazilian Facebook users about the Zika disease. When personal data is used in ways that improve welfare, people will again be open to sharing it with public agencies and other organizations.

And people now are increasingly able to benefit from such data directly, using a wider range of progressively sophisticated tools to process data and derive their own conclusions. This includes making personal finance decisions or modifying health-related behavior.

Benefits due to financial value

Personal data has financial value, mainly placed on it by organizations that use that data to market products and services to their customers. This financial benefit is typically not available to the individuals that produce the data, but those organizations could "pay back" the producers of data directly by providing them with access to additional digital services, or indirectly through wider economic benefits, such as the availability of credit ratings allowing access to credit. For example, most of those online provide personal information for access to advertiser-sponsored digital applications such as search engines, storage, email, and social media. And there is significant personal-data-driven advertising sponsored content online, such as news, health, and education sites of importance to individuals.

Personal data also supports a vast ecosystem of digital companies (see chapter 5), and is beginning to influence firms outside the traditional digital sectors as well. The growth of such businesses – fueled by data – implies economic growth that in turn will benefit individuals. The large information technology and services companies that use and benefit from personal data have created thousands of direct and indirect jobs, for example, and have created platforms that have led to the creation of other businesses. Not all are positive developments, with opportunities for some to generate fake data, for instance (see box 4.1).

Box 4.1 Income Generating Opportunities

Some people benefit financially and directly from their ability to earn revenue from the data economy. This includes a handful of services that provide money (or discount coupons) in exchange for personal information.

People can set up websites and receive income from personal-data-driven advertising tools such as Google's AdSense. Freelancers can earn money from jobs in data-related areas on Mechanical Turk (www.mturk.com); Upwork a freelance broker reported that jobs associated with data and artificial intelligence were among the fastest growing in the fourth quarter of 2017.

And potential could exist for outsourcing analytics projects; a data scientist in India, for example, reported earning US$200 an hour for overseas jobs. But, while individuals could get a financial return for their own personal data, they might also do so with false data. Income can be made from ethically questionable activities such as using fake accounts or reviews to influence social media. For instance, the #richkidsofinstangram handle was used by social media influencers to attract unwitting users to invest in dubious online trading schemes.a Estimates of fake accounts – also created by governments and criminals – range from almost 50 million for Twitter to about 60 million for Facebook.


Costs and risks

Despite the potential benefits of the data revolution, people, and especially the poor, are often subjected to many costs and risks or even have been largely dependent on the organizations that collect or use their personal data as gatekeepers to realize the benefits of those data, and to act on their decisions. The costs and risks stem from two issues involving these organizations: first, the limitations in how the analog world permits people to benefit from the data revolution, and second, the unequal power relationships between people and these organizations. The first can be discussed briefly, as its resolution requires a shift beyond the data economy itself; the focus instead will be on the second precluded from partaking of the benefits described above. This stems mainly from how individuals

Risks arising in the analog world
A key risk in the data economy is that missing analog complements, such as limited literacy, can constrain the extent to which people can realize benefits from digital data markets. For instance, if organizations do not function well or are in uncompetitive markets, the collection of more data might not improve flows of information or decision-making by individuals, nor will it create incentives to deliver the expected benefits. In such a market, people may perceive the value of their data as low, because of information asymmetries, and many may give up their data unknowingly or without expecting an appropriate return.

The poor also face risk of exclusion: the barriers to entry in data markets are often too high for them, as they do not have access to digital technologies or they lack the skills to use data and convert it into relevant or useful information. Although the use of new technologies has exploded across the globe in the past 10 years, the price to access this data is still prohibitive for many. In Bolivia, Honduras, and Nica ragua, for example, a mobile broadband subscription exceeds 10 percent of average monthly GDP per capita, compared with France and Korea, where it is less than 0.1 percent (see figure 5.6). Many people – especially women, people living in the 40 percent of the population with the lowest incomes, or people with disabilities – lack the digital tools or literacy to use technology. People who over-share online data concerning their sexuality, eating and drinking habits, or their taste for high-risk sports may be unwittingly excluding themselves from insurance coverage, or at least raising their premiums.

One consequence is that digitally excluded populations increasingly risk exclusion from data sets created from mining digitally generated information that might be used to enhance their livelihoods. And this makes many developing countries "data poor" themselves; that is, they have substandard data on a population, with entire groups of people invisible, such as unemployed women, indigenous populations, or slum dwellers.

The poor also often face constraints on how they use data – even if they are aware the data exists. This is because growing data flows often do not reach them, due to weak institutions or constraints on the functioning of markets. For example, if government weather data is not made public quickly, it will not benefit them. Or if disaster preparedness and response systems are not in place or fail to operate because alerts do not reach people quickly, even having that data will not expand opportunities in a way that would allow most to benefit from them.

Costs and risks arising from the data market(s) status quo
Costs are embedded in how data is shared and consumed, because of the structure of the markets in which the data is used. These costs might not be transparently disclosed to individuals (data producers), or they might have unintended consequences for the way that the data market functions. Several costs can be identified: loss of privacy, loss of control, loss of agency. When these costs are disclosed or uncovered (especially unintentionally, such as through data leakage, or deliberately through hacking), they could undermine the functioning of the digital ecosystem supported by data markets due to a loss of trust in the participants in those markets.

Concerns about privacy have been central to discussions about the data economy. Cases exist in which people may provide personal data willingly – to government officials, health care workers, or marketers. For example, they might trade it, knowingly or unknowingly, for access to online information services. Collection of such data allows data-driven services to improve. But this may mean people lose some privacy willingly. And it has also made securing and protecting personal data increasingly important for all kinds of organizations, both data collectors and users. Incidents in 2017 and 2018 have shown that the personal data of millions of people could be accessed – legally, accidentally, or illegally – including through means that neither the individuals nor data collectors might have been aware of. Most notable is the use of data collected through personality profile surveys on Facebook for targeted political advertising campaigns.

Much personal data, held and used as it is in financial, health, or public services organizations, is sensitive, and privacy has therefore been recognized as a fundamental human right deserving protection. Loss of privacy risks becoming a negative influence on the behavior of others or organizations, such as through exclusion of people from access to services, social threats (bullying and stalking), or employment hiring or firing decisions.

Transparency about what data is being collected, from whom, and about how it could be used is critical. However, much of the data people generate is now automatically created through their actions and often does not request explicit permission for collection or sharing with others (beyond accepting terms and conditions, often wordy and complicated). Because digital data is effectively permanent and can be replicated infinitely, its use can extend far beyond what was earlier possible with analog records. Such loss of control occurs as people give data away unwillingly or unknowingly, and, hence, lose control over it, are not aware of how or when it will be used or by whom, and are unable to engage in its secondary use.

One example is Meitu, a photo-enhancing app that requests access to far more data than needed, such as GPS location, cell carrier information, Wi-Fi connection data, SIM card information, and personal identifiers that could be used to track people's devices and sell the data without them realizing it. Users have control over whether to use an application or not as well as to adjust privacy settings within applications, but the configurations can be complicated or unwittingly bypassed. Often, individuals are unable to deny an organization control of their data, sometimes exclusively, without giving up access to all of its services; no options are available, especially in the online world, where terms and conditions to give up control are frequently "take it or leave it".

Loss of agency happens when algorithms or the input data causes people to lose control over their actions or restrict their ability to determine their own choices. Such loss is reinforced by the development of algorithms that are starting to offer choices to people for everything from what movies to watch, which news sources are relevant, what to buy, or which web pages might offer the information they seek.

Those algorithms are developed based on models of personal preferences, using user data, that are abstractions of individual behavior. Such algorithms may frequently be inaccurate, no matter their sophistication. They model an individual's preferences, discouraging experimentation and reinforcing segmented stereotypes, often hidden from view both in what the sources of data are and how the algorithm itself works. At the time of writing, discussions had grown about how algorithms on some platforms might influence significant choices, such as voting. And even if the more serious of these claims are ultimately unproven, the working of many of these algorithms is not clear (as well as what biases might inadvertently or purposefully exist).

These hidden costs, when they are disclosed, are often then accompanied by significant negative publicity for the organizations involved. This could undermine the provision of such products or digital services – dependent as they are on personal data – because people lose trust in those services. Theft of personal data, its growing accumulation and analysis by companies, and the spread of fake information increasingly targeting specific groups of people lowers trust for governments that people feel are not doing enough to protect them and for companies they feel are misusing their data.

Underlying many of these risks is the imbalanced structure of many data markets. Increasingly, private organizations are holding and using data, and these organizations are not subject to democratic pressure (as many public institutions are), and increasingly are subject to winner-takes-all pressure in network industries. As noted, individuals are often unable to negotiate better terms and conditions related to their data or create better trade-offs between their privacy, control, agency, and access to services. Better informed and targeted regulation is part of the solution, given the collective action problem that occurs when large numbers of people engage with such organizations or networks. The next section discusses other protections that might be needed.