Using Data Responsibly

Site:	Saylor Academy
Course:	BUS607: Data-Driven Decision-Making
Book:	Using Data Responsibly

Printed by:	Guest user
Date:	Monday, 19 May 2025, 4:16 AM

Description

Read this section to explore how data needs to be used responsibly, the role of artificial intelligence, and the effects of data on people.

Better Data for Doing Good: Responsible Use of Big Data and Artificial Intelligence
People and Data

Better Data for Doing Good: Responsible Use of Big Data and Artificial Intelligence

Introduction

The data universe is ever expanding, as chapter 2 illustrates. In fact, it is estimated to double in size every two years with some 2.5 quintillion bytes of information being generated daily. Because we increasingly use digital devices to communicate, buy and sell goods, transfer money, search for information on the internet, and share our lives on social networks, we leave digital trails or "digital exhaust". A growing amount of digital data is thus being generated as a by-product of our daily lives, but also through the increasing digitization of content and the spread of the Internet of Things. This growing volume of data is driving the development of big data analytics and artificial intelligence (AI), the subjects of this chapter. The chapter describes opportunities for harnessing the value of big data and AI for social good, and how new families of AI algorithms now make it possible to obtain actionable insights automatically and at scale. Beyond internet business or commercial applications, multiple examples already exist of how big data and AI can help us achieve our shared development objectives, such as the 2030 Agenda for Sustainable Development and the Sustainable Development Goals (SDGs). But ethical frameworks need to be developed in line with increased uptake of these new technologies – any discussion of ethics is not limited to the privacy of the data, but also relates to the impact and consequences of using data and algorithms – or failing to use them.

Source: World Bank, https://openknowledge.worldbank.org/handle/10986/30437
This work is licensed under a Creative Commons Attribution 3.0 IGO License.

The Big Data Revolution

As chapter 1 notes, the concept of big data typically describes data sets so large, or so complex, that traditional data processing techniques often prove inadequate. The term "big data" thus captures not only the large volumes of data now available, but also the accompanying processes and technologies for collecting, storing, and analyzing it. In other words, "big data" is not just about data – "no matter how big or different it is considered to be" – it is primarily about "the analytics, the tools and methods that are used to yield insights," including the frameworks, standards, and stakeholders involved in the field and ultimately the knowledge generated.

Although businesses increasingly are mining the digital trails we leave behind to predict consumer behavior, track emerging trends in the market, and monitor operations in real time to improve sales and profit margins, big data analytics also holds enormous potential to help understand and address pressing socioeconomic and environmental issues.

Big data can help inform policy and interventions that set us on a more sustainable development path and improve responses to humanitarian emergencies.

Innovation labs across academia, government, the international development community, civil society, and the private sector have been using big data and AI to develop a wide range of applications, from mapping discrimination against refugees in Europe to facilitating the rescue of migrants at sea based on shipping data, detecting fires in the Indonesian rainforest, predicting food insecurity due to changing food prices via Twitter, or fighting the effects of climate change. Box 3.1 describes how big data is also being used to predict and respond to disease outbreaks.

Box 3.1 Using big data to predict dengue fever outbreaks in Pakistan

Dengue fever is the most rapidly spreading mosquito-borne viral disease in the world. It is endemic in Pakistan, where human mobility and hospitable conditions for mosquitoes have helped it spread. Those infected typically suffer from severe illness, and mortality rates are high.

A partnership involving Telenor Research, the Harvard T.H. Chan School of Public Health, Oxford University, the U.S. Centers for Disease Control and Prevention, and the University of Peshawar used big data to anticipate and track the spread of dengue in Pakistan. The partnership leveraged anonymized call data records from 40 million Telenor Pakistan mobile subscribers during the 2013 outbreak to map the geographic spread and the epidemiological timeline of the disease. The analysis combined transmission suitability maps with estimates of seasonal dengue virus importation to generate detailed and dynamic risk maps, helping to inform national containment and epidemic preparedness in Pakistan and beyond.

More broadly, the project illustrates the potential of mobile data to reveal mobility patterns that can help accurately predict the spread of disease. The insights it generated helped predict the spread days or even weeks earlier than traditional means.

The Evolution of Artificial Intelligence

Historically, the term "artificial intelligence" has been applied where computer systems imitate thinking or behavior that people associate with human intelligence, such as learning, problem solving, and decision-making. Modern AI comprises a rich set of subdisciplines and methods that leverage technologies such as visual, speech, and text recognition, as well as robotics. Machine learning is one such subdiscipline. Whereas hand-coded software programs typically contain specific instructions on how to complete a task, machine learning allows a computer system to recognize patterns and make predictions. Deep learning, a subset of machine learning, goes one step further – with deep artificial neural networks, based on complex algorithms, computers can learn from large volumes of data while reaching new levels of accuracy.

In sum, AI is enabling computer systems to collect, analyze, and process large amounts of data in real time to recognize patterns, make decisions, and, more significantly, to learn from said data and from their own experiences.

Meanwhile, recent advances in sensors and imaging technologies and data storage, processing, and transfer technologies, as well as complex and self-improving algorithms, to name but a few, are the range of expanding AI applications available today. AI is already incorporated in several online products, including Google search, Google Translate, and Facebook's automatic photo-tagging and translation applications. Financial companies rely on AI to produce the financial modeling that underpins their insurance, banking, and asset management products. Moreover, leading research hospitals have started using AI tools to help medical professionals diagnose and choose the best course of treatment for their patients.

Although the current application of AI is mostly limited to internet business, digital marketing, gaming, and selfdriving cars, a wealth of opportunities exist for AI methods to perform different tasks that can accelerate achievement of the SDGs and inform humanitarian practice. Box 3.2 describes how AI can help transform traditional sectors, such as transport.

Box 3.2 Artificial intelligence and the transport sector

The proliferation of big data is helping to transform the transport sector. Fueled by data and connectivity, a variety of intelligent transport systems have been introduced as the sector rapidly evolves.

Alongside other disruptive technologies, such as connected vehicles and automated driving, these intelligent systems are soon expected to completely change the way people and goods are moved. Big data can be combined with predictive analytics, for example, to optimize cargo transport networks based on projected shipping demand. Data exchanged among vehicles and infrastructure will soon be used to automatically optimize vehicle routes and speeds in real time, reducing congestion and emissions. In the Philippines, for example, real-time traffic data shared using open source tools is being used to optimize traffic flows in Manila and Cebu City. In Indonesia, location information from GPS-stamped tweets is being used to reveal commuting statistics in the Greater Jakarta area.

The potential for data-driven intelligent transport systems to transform the world's transportation systems is immense, particularly if the data is combined with new ways to link disparate data sets and creative methods to visualize data.

Using Big Data and AI as a Force for Social Good

AI and big data are generating new tools and applications creating actionable insights, real-time awareness, and predictive analysis on numerous topics for sustainable development and humanitarian action. More and more compelling examples illustrate the value of this technology to improve early warning systems and inform policy and programmatic response. These individual use cases represent a small but significant innovation in learning about the world around us. Taken together, they provide new ways to detect and respond to world events, influence policy debates, and drive development, in a way that is both safe and fair (figure 3.1, table 3.1).

The following sections examine the benefits and applications of big data and AI – including for (a) speech and audio processing, (b) image recognition and geospatial analysis, and (c) text analysis. They also describe how AI is being leveraged to support the SDGs and address the emerging challenges and risks that accompany the uptake of these technologies.

Figure 3.1 The Sustainable Development Goals

Table 3.1 Examples of artificial intelligence applications for the Sustainable Development Goals

SDGs	Value of artificial intelligence	Case study	Risks and challenges
SDG 1: No poverty	Artificial intelligence (AI) can be used to monitor income and track policies to identify progress and successful practices.	Combining satellite imagery and machine learning to predict poverty in Nigeria, Tanzania, Uganda, Malawi, and Rwanda Jean et al. combined nighttime maps with high- resolution daytime satellite images to obtain estimates of household consumption and assets. Using survey and satellite data from five African countries – Malawi, Nigeria, Rwanda, Tanzania, and Uganda – the study showed how a convolutional neural network can be trained to identify image features that can explain up to 75 percent of the variation in local-level economic outcomes.	There is a risk of omitting segments of the population that cannot be captured by remote sensing signatures because of their lack of footprint or the given sociocultural context.
SDG 2: Zero hunger	AI can be used to maximize yields and improve agricultural practices based on multiple data sources.	Detecting patterns in big data saves Colombian rice farmers huge losses A project run by the International Center for Tropical Agriculture mined 10 years of weather and crop data to understand how climatic variation affects rice yields. The project fed the patterns into a computer model and predicted a drought in the region of Córdoba. The center subsequently advised the Rice Producers Federation of Colombia (FEDEARROZ) against planting in the first of two annual growing seasons. This advice saved farmers from incurring significant losses.	Overexploitation, based on local optimization, could lead to exhausted lands and lack of resources at the systemic level
SDG 3: Good health and well-being	AI can be used to support diagnosis and personalized medical treatment.	Revolutionizing personalized medicine using AI Watson, IBM's "cognitive computing" platform uses natural language processing to efficiently and quickly sort through millions of journal articles, government listings of clinical trials, and other existing data sources to help diagnose patients and provide personalized treatment plans. University of Tokyo doctors reported that the artificial intelligence diagnosed a 60-year-old woman's rare form of leukemia that had been incorrectly identified months earlier in less than 10 minutes.	Overpersonalized medicine could lead to abuse from the insurance industry and other stakeholders based on private personal information.
SDG 4: Quality education	AI can be used to tailor the delivery of education based on each student's needs and capabilities.	Detecting dyslexia in children in Spain Ten percent of the population has dyslexia, a neurological learning disability that affects reading and writing but does not affect general intelligence. Children with dyslexia can learn coping strategies to deal with its negative effects. Unfortunately, in most cases dyslexia is detected too late for effective intervention. Change Dyslexia is a project that uses cutting edge scientifically based computer games, such as Dytective Test and DytectiveU, that screen and support dyslexia at large scale.	There is the danger that harmful media can be easily accessed by children. For example, the use of YouTube Kids videos optimized with AI and bots that create long, repetitive, and sometimes frightening videos meant to keep children entertained for as long as possible.
SDG 5: Gender equality	AI can help correct for gender bias in insights derived from big data and nontraditional data sources.	Mapping indicators of female welfare at high spatial resolution in Kenya, Nigeria, Tanzania, Bangladesh, and Haiti A project by Flowminder and WorldPop used geo-located cluster data from the Demographic and Health surveys on rates of literacy, stunting, and use of modern contraception methods to produce high- resolution spatial gender-disaggregated maps, using predictive modeling techniques. The study focused on three countries in Sub-Saharan Africa (Kenya, Nigeria, and Tanzania), one country in South Asia (Bangladesh), and one country from the Western hemisphere (Haiti).	AI applications are at risk of reinforcing existing gender biases present in the data used to train the algorithms.
SDG 6: Clean water and sanitation	AI can predict consumption patterns from sensor data to optimize water and sanitation provision.	Monitoring coastal water quality in real time in Singapore Project Neptune is a real-time monitoring and prediction system strategically deployed around Singapore's coastline. The system integrates hydrodynamic and water quality modeling into a forecasting framework that forms the backbone of a central operational management system. Eight specially outfitted buoys act as miniature labs, collecting data on pollutants, including oil and nutrients, and send live updates to the authorities on how they could spread.	AI (or simple malware) can be used to attack or disable critical public infrastructure by means of remote warfare.
SDG 7: Affordable and clean energy	AI can be used to make existing infrastructure more intelligent and energy efficient	Preventing power supply failures in domestic railway networks in India Aiming to reduce the risk of signal failure, Indian Railways has trialed remote condition monitoring of the power supply systems, leveraging AI to predict possible outages. The measure is set to be rolled out on two sections of the Western and South-Western railway network.	As noted above, critical network infrastructures may be subject to cybersecurity threats.
SDG 8: Decent work and economic growth	AI can be used to optimize recruitment for both employers and jobseekers.	Optimizing online job searches LinkedIn, a well-known business- and employment- oriented social networking service, uses AI and big data to help recruiters automate much of the candidate screening process. The tool is also integrated in different applicant tracking systems and, for example, automatically synchronizes with the different open jobs, ranking candidates against them.	If algorithms learn hiring practices based on biased data that prefers, for example, Caucasian names rather than others, it can make biased hiring decisions.
SDG 9: Industry, innovation and infrastructure	AI can be used to automate and eliminate rote or routine work, freeing up labor to focus on more creative tasks.	Speeding up toy production in Denmark A factory in Denmark uses autonomous robots and precision machines to make 36,000 Lego pieces per minute, or 2.16 million pieces every hour.	AI will transform and could eliminate some jobs. McKinsey estimates that some 60 percent of all jobs will see a third of their activities automated.
SDG 10: Reduced inequalities	AI can support translation of less- known languages to ensure all voices are accounted for in decision-making processes.	Accelerating development in Uganda with speech recognition technology UN Global Pulse and the Stellenbosch University in South Africa used machine learning to develop speech-to-text technology to filter the content of public radio broadcasts for less-known languages spoken in Uganda. Once converted into text, the information can reveal sentiment around topics relevant for sustainable development.	Advances in robotics and AI could increase inequality within societies, further entrenching the divide between rich and poor.
SDG 11: Sustainable cities and communities	AI can measure traffic in real time, monitor commuting statistics, or improve transportation services.	Inferring commuting statistics in Indonesia with Twitter Some estimates for the Greater Jakarta area put the population at more than 30 million. In response to the needs of the authorities, UN Global Pulse – Pulse Lab Jakarta initiated a project to test whether location information from social media on mobile devices could reveal commuting patterns in the area. The results of the research confirmed that geo-located tweets have the potential to fill current information gaps in official commuting statistics.	AI may lead to cascading failures of interconnected systems in smart cities. Failures in machine learning algorithms need to be accommodated in urban emergency planning.
SDG 12: Responsible consumption and production	AI can improve efficiency of recycling processes, which can eliminate waste and improve yields.	Supporting smart recycling in the United States with dumpster diving robots Spider-like robotic arms, guided by cameras and artificial intelligence, are helping to make municipal recycling facilities run more efficiently in the United States. Through deep learning technology, robotic sorters use a vision system to see the material, AI to think and identify each item, and a robotic arm to pick up specific items. The technology could help make recycling systems more effective and profitable.	AI can also be used to increase the scale of extractive or manufacturing industries, creating a larger environmental footprint over time.
SDG 13: Climate action	AI and climate science can help researchers identify previously unknown atmospheric processes and rank climate models.	Predicting road flooding for climate mitigation in Senegal Using data from mobile operator Orange, a team from the Georgia Institute of Technology developed a framework to improve the resilience of road networks in Senegal to flooding, including recommendations on how to prioritize road improvements given a limited budget. The results showed how roads are being used, how they are damaged, and how policy makers can allocate budget in the most efficient way to repair them.	Heavy computation required to power AI may lead to increased energy costs.
SDG 14: Life below water	AI can help detect, track, and predict the movement patterns of vessels engaged in illegal fishing.	Supporting sustainable legal fishing in Indonesia Indonesia and Global Fishing Watch – a partnership between Google, Oceana, and SkyTruth – are cooperating to deliver a vessel monitoring system for all Indonesian-flagged fishing vessels and generate data that is publicly available. The project aims to promote transparency in the fishing industry.	The data collected might be incomplete, as some vessels may be undetectable when switching off their transmitters.
SDG 15: Life on land	AI can be used to map and protect wildlife on land using computer vision systems.	Identifying, counting, and describing wild animals in camera-trap images in Tanzania The University of Minnesota Lion Project deployed 225 camera traps, across 1,125 square kilometers, in Serengeti National Park to evaluate spatial and temporal dynamics. The cameras accumulated some 99,241 camera-trap days, producing 1.2 million pictures between 2010 and 2013. Members of the general public classified these images via a citizen-science website. The project then applied an algorithm to aggregate the classifications to investigate multi- species dynamics in the local ecosystem.	Monitoring technologies can be used by poachers just as easily as conservationists.
SDG 16: Peace, justice and strong institutions	AI can reduce discrimination and corruption and drive broad access to e-government.	Turning information into knowledge and action in Estonia In Estonia, government services – legislation, voting, education, justice, health care, banking, taxes, policing, and so on – have been digitally linked across one platform, "wiring up" the nation. Estonia is also exploring ways to leverage AI to improve e-government and other public services.	Citizen monitoring could be misused to repress political practices (such as voting, demonstrations).
SDG 17: Partnerships for the Goals	AI should be a public good.	Leveraging partnerships to improve AI for global good Multisectoral collaboration is essential for the safe, ethical, and beneficial development of AI. The Partnership on AIc represents a collection of companies and nonprofits that have committed to sharing best practices and communicating openly about the benefits and risks of AI research. Another example is the annual "AI for Good Global Summit"d organized by the International Telecommunication Union, the UN's specialized agency for information and communication technologies.	Collaboration must also result in action

Speech and audio processing

Arguably, one major achievement of big data and AI has been to facilitate real-time translation of a growing number of the world's languages. Although language translation is not an SDG per se, greater language and cultural understanding could help increase the efficiency and effectiveness of development efforts across all SDGs – for example, by helping to map public opinion (see box 3.3). Google and Microsoft systems, for example, are now able to translate over a hundred languages. Also, new systems have been developed that perform real-time translations – such as a Skype system that can translate voice calls into 10 different languages in real time.

Early models of machine translation used statistical methods that translated words based on a short sequence, that is, within the context of several words before and after the target word, which did not always work for long and complex sentences. New neural network architectures, such as long short-term memory, have drastically improved efficiency. Such systems can now learn from millions of examples and are able to translate whole sentences at a time, rather than word by word.

Box 3.3 Using machine learning to analyze radio broadcasts in Uganda

Radio remains a primary source of information for communities in many parts of the world, particularly in remote rural areas where coverage and access to other forms of connectivity is limited. Radio is also an accessible medium for the millions who remain illiterate.

In Uganda, where a majority of the population lives in rural areas, radio is a vibrant platform for community discussion, information sharing, and news broadcasting. Radio talk shows and dial-in discussions are popular forums for voicing local needs, concerns, and opinions.

UN Global Pulse collaborated with Stellenbosch University in South Africa to develop speech-recognition technology to automatically convert these radio broadcasts into text for several of the languages spoken in Uganda, including English, Luganda, Acholi, Lugbara, and Rutooro. "Radio mining" consisted of two automated software stages and two human analysis stages. This semi-automated approach allowed a relatively small team of analysts to process many audio recordings quickly and affordably.

Several projects were piloted with UN partners to understand the value of talk radio to provide information on topics relevant to the Sustainable Development Goals, such as health care service delivery, response to disease outbreaks, and the efficiency of public awareness raising radio campaigns, among others

Computer vision, image analysis, and geospatial data

Accurate population information is critical for authorities to plan and deliver quality public services and coordinate crisis-relief efforts. However, collecting related data traditionally is a long-standing challenge for development practitioners and policy makers. For example, gathering national household survey data on poverty is typically time-consuming and expensive, requiring elaborate data collection and analysis techniques. This exercise is particularly challenging in fragile states, where limited capacity and security concerns typically hinder data collection and processing. In this setting, for example, satellite imagery has been used to gain an overview of population density and assess poverty and access to energy – covered by SDG 1 and SDG 7 (see boxes 3.4 and 3.5).

In the health sector – covered by SDG 3 – current advances in medical imaging and computer analysis of tumors can complement and refine radiologists' analysis. Mobile phone call records have also been combined with satellite data to build dynamic population maps and estimate cross-border flows of migrants to enable development actors to track the spread of disease. This technique was leveraged in southern Africa to map the movements of cross-border communities to better understand malaria infections patterns.

In the environmental field – SDGs 12, 13, 14, and 15 – AI-assisted analysis of satellite imagery can be used to monitor damage to coastal areas due to floods or typhoons, or drought-affected areas, or the retreat of wetlands or encroaching land use in deltas or river basins. Combined with meteorological models and large data sets on changes in ocean temperature and currents, such mapping can help improve forecasting and early warning systems of future major weather events. Moreover, GPS data has been used to analyze traffic patterns to reduce pollution (see box 3.6). Another AI application getting considerable attention is automated or self-driving cars – a potential solution for optimizing transportation in ways that can minimize car accidents. Debate is ongoing about what a fully automated car really is, but considerable progress has been made toward solving problems of visual recognition, object identification, and reaction processing, which are critical to this endeavor.

Building on humble beginnings and minor innovations (including cruise control, assisted steering, lane assist, automatic braking, and "Traffic Jam Assist"), the race toward a fully automated car is now underway (box 3.7).

Box 3.4 Estimating population counts and poverty in Afghanistan and Sudan

In Afghanistan, the United Nations Population Fund and the UN Country Team collaborated with Flowminder, an organization that collects, aggregates, and analyzes anonymous mobile, satellite, and household survey data to generate population maps. The project used survey data, geographic information systems, and satellite imagery data to estimate populations in areas with no such data.

In Sudan, the United Nations Development Programme used satellite data to estimate poverty by studying changing nighttime energy consumption. The team used data pulled from nighttime satellite imagery, analyzing illumination values over two years, in conjunction with electric power consumption data from the national electricity authority. The study was also informed by desk research, including similar World Bank work in Kenya and Rwanda. Electricity consumption was used as a proxy indicator for income, as poorer households were assumed to be lower energy consumers. The exercise demonstrated how satellite imagery can help measure poverty.

Box 3.5 Mapping energy access in India

Satellite night-light data has also been leveraged in India. A team from the University of Michigan, the U.S. National Oceanic and Atmospheric Administration, and the World Bank Group's Energy and Extractives Global Practice analyzed the daily light signatures of more than 600,000 villages from 1993 to 2013 (see map B3.5.1).

Electrification trends were visualized on NightLights.io, an open-source platform for processing big data in a scalable and systematic way. The platform features an application programming interface that enables technical partners to query light output. And its interactive maps allow users to explore light output trends. Through the project, the research team gained a high-level overview of rural electrification, compared villages and plot trends, and shared data, which can help inform government policy.

Map B3.5.1 Night lights in India

Box 3.6 Cleaning Mexico City's air with big data and climate policy

Mexico City's congestion, among the world's worst, worsens local air quality. City dwellers are exposed to twice the recommended level of ozone and fine particulate matter (PM2.5), as advised by national standards and according to 2016 data, resulting in some 10,850 annual deaths. A team of researchers from the University of California, Berkeley, and the Instituto Nacional de Ecología y Cambio Climático in Mexico used data from Waze, a GPS navigation software, to evaluate various transport electrification options based on their ability to reduce urban air pollution and emissions – including (a) the electrification of the entire city taxi fleet, (b) the electrification of public transit buses, and (c) the electrification of all light-duty vehicles.

The team first measured the number, location, and duration of traffic jams throughout the city, estimating related emissions using the MOVES-Mexico model. The team then used data from Google's "popular times" function to map urban population movement.

Using this information, the team was able to identify the best policy options and optimal locations for electric vehicle charging stations.

Box 3.7 Self-driving cars

Human error causes about 90 percent of all car accidents. Artificial intelligence (AI) and autonomous driving might therefore help reduce accidents and save lives. Self-driving cars have to identify, assess, evaluate, and respond to fast-changing circumstances, and predict likely events in real time. A fully automated car has to master vehicle dynamics, control systems, and sensor optimization. For example, detecting pedestrians from images or video is a very specific image-classification problem.

Driverless cars require robust data capacity for image processing and recognition. Navigation and mapping data is also essential, with GPS coordinates used extensively. Mercedes, BMW, and Audi purchased the mapping business Here from Nokia for US$2 billion; Here combines "static" mapping data taken from cars with 3D cameras with live information supplied by a network of connected devices, including cars (Bell 2015). In January 2016, Volkswagen partnered with Mobileye, a technology company that develops vision-based advanced driver-assistance systems, to produce its real-time image-processing cameras and mapping service for driverless cars. Ford became the first manufacturer to road test a fully autonomous car in snow on public roads in March 2016 after working with researchers from the University of Michigan to create an algorithm recognizing snow and rain (Ford 2016). Ford has already tested autonomous Fusion cars on public roads in the U.S. states of Arizona, California, and Michigan.

Despite these groundbreaking developments, the move toward autonomous driving is not without its problems. Many worry that a car-centric vision detracts from more sustainable solutions related to public transportation and urban design (covered by Sustainable Development Goal 11). Driverless vehicles are also likely to wipe out millions of jobs, including taxi drivers, couriers, and truck drivers, something new policies must address urgently. Moreover, legal frameworks will need to keep pace and be redesigned. Although a few countries are moving to issue new legal frameworks for autonomous driving, significant legal gaps remain.

Text mining and text analysis

Also known as text mining, text analytics is the science of turning unstructured text into structured data. Text analytics is focused on extracting key pieces of information from conversations. By understanding the language, the context, and how language is used in everyday conversations, text analytics uncovers the "who" of the conversation, the "what" or the "buzz" of the conversation, "how" people are feeling, and "why" the conversation is happening. Conversations are categorized and discussion topics identified.

The technology is being leveraged, among other things, to support agricultural development and build food security – covered by SDG 2. Kudu, a mobile auction market application, is using text analysis algorithms to match farmers looking to sell their produce with suitable market traders. The system allows any farmer or trader to send a message by phone. Once matched, compatible buyers and sellers are notified. Kudu not only limits unnecessary travel and dependency on intermediaries, but encourages competition by overcoming critical information gaps. The application was developed by the AI Research Group, which is specialized in the application of AI to problems in the developing world and operates out of the College of Computing and Information Sciences at Makerere University in Kampala, Uganda.

Analysis of text from Twitter feeds has also been used to track food prices in real time in Indonesia. UN Global Pulse worked with the Ministry of National Development Planning and the World Food Programme to "nowcast" food prices based on Twitter data. The outcome was a statistical model of daily price indicators for four commodities: beef, chicken, onion, and chili. When the modeled prices were compared with official food prices, the forecast and actual prices were closely correlated, demonstrating that near real-time social media signals can serve as a proxy for daily food prices.

Similar techniques are being used to analyze a host of other development issues. For example, the ability to monitor public sentiment toward policy measures in real time, via social media, can provide critical information on the impact of policy and how it is playing out in practice, especially for vulnerable groups or households (box 3.8). Data from social media can also help estimate the number of expats around the world (box 3.9).

As mentioned earlier, conducting household surveys is often expensive. New approaches such as monitoring social media could help address data gaps in developing economies. Moreover, these approaches may capture marginalized or migrating communities not always accounted for by traditional means such as national censuses.

Box 3.8 Monitoring public sentiment about policy reforms using social media in El Salvador

In April 2011, the government of El Salvador removed a countrywide subsidy on liquid petroleum gas, the most common domestic cooking fuel. Instead of subsidizing prices at point of sale, eligible households were given an income transfer. The reform triggered considerable public debate and controversy.

UN Global Pulse and the World Bank teamed up to investigate whether social media signals from Twitter could be used to understand public perceptions and social dynamics surrounding the fuel subsidy reform, specifically reactions and concerns about political partisanship, the level of information reaching communities about the reform, and trust in government commitment to deliver the subsidy. A taxonomy of keywords was developed to filter Twitter for relevant content. Regional experts were consulted to ensure slang words and synonyms were included in the taxonomy. Tweets were then filtered to assess relevance and isolate content originating from El Salvador.

The study suggests that social media analysis, using big data and AI, can help inform policy implementation, as the sentiment observed was similar to public opinion measured by household surveys.

Box 3.9 Shedding light on migration patterns using social media information

Data from social media can be used to help estimate migrant populations. For example, studies based on Facebook data yield estimates of approximately 214 million "expats" in the world (people stating that they live in a country other than their self-reported "home country"), close to the 2017 estimated total of 258 million international migrants globally.

Among the issues surrounding the use of social media data to estimate migrant populations are the difficulty in defining who an international migrant is, selection bias, and the reliability of self-reported information. But scholars are working on reducing selection bias via model fitting and results are promising.

From Design to Responsible Use: Ethical Challenges with Using Big Data and AI

Although we are only scratching the surface of what is possible in the new age of big data and AI, and how they can be leveraged for social good, we also need to grapple with both the unintended risks and malicious use of the same technology. These benefits and looming risks were aptly articulated by the UN Secretary-General at the 2017 "AI for Good Global Summit":

We face a new frontier, with advances moving at warp speed. Artificial intelligence can help analyze enormous volumes of data, which in turn can improve predictions, prevent crimes and help governments better serve people. But there are also serious challenges, and ethical issues at stake. There are real concerns about cyber security, human rights and privacy. . . The implications for development are enormous. Developing countries can gain from the benefits of AI, but they also face the highest risk of being left behind.

Algorithm-based systems, powered by big data and AI, increasingly both learn from and autonomously interact with their environments, as well as one another. In April 2011, the government of El Salvador removed a countrywide subsidy on liquid petroleum gas, the most common domestic cooking fuel. Instead of subsidizing prices at point of sale, eligible households were given an income transfer. The reform triggered considerable public debate and controversy. UN Global Pulse and the World Bank teamed up to investigate whether social media signals from Twitter could be used to understand public perceptions and social dynamics surrounding the fuel subsidy reform, specifically reactions and concerns about political partisanship, the level of information reaching communities about the reform, and trust in government commitment to deliver the subsidy. A taxonomy of keywords was developed to filter Twitter for relevant content. Regional experts were consulted to ensure slang words and synonyms were included in the taxonomy. Tweets were then filtered to assess relevance and isolate content originating from El Salvador. The study suggests that social media analysis, using big data and AI, can help inform policy implementation, as the sentiment observed was similar to public opinion measured by household surveys. Source: Adapted from UN Global Pulse 2015. Box 3.8 Monitoring public sentiment about policy reforms using social media in El Salvador Data from social media can be used to help estimate migrant populations. For example, studies based on Facebook data yield estimates of approximately 214 million "expats" in the world (people stating that they live in a country other than their self-reported "home country"), close to the 2017 estimated total of 258 million international migrants globally. Among the issues surrounding the use of social media data to estimate migrant populations are the difficulty in defining who an international migrant is, selection bias, and the reliability of self-reported information. But scholars are working on reducing selection bias via model fitting and results are promising. Source: Adapted from Rango and Vespe 2017. Box 3.9 Shedding light on migration patterns using social media information. This tends to generate behavioral patterns that cannot always be predicted or explained. Where this evolution in AI will ultimately take us is not yet clear. Some raise the risk of autonomous weapons or viruses targeting individuals with a particular defective DNA trait as one frightening scenario. And rising concerns about the malicious use of AI, for instance, for profiling, merits a stronger ethical governance and regulatory framework that covers how related methods are developed and deployed. The risk of unintended consequences of AI should be accounted for at each stage of innovation, beginning with design.

Technologies and algorithms by themselves have no intrinsic morality – however, technology can be used for good or bad depending on how it is employed. Looking at existing technologies, ethical considerations need to address questions such as what life-and-death decisions self-driving cars make. Although privacy norms have been long established to protect personal data from misuse and ensure individual privacy in the digital world, ethics has become an additional tool in AI applications used to protect fundamental human rights and help make decisions in areas where law has no clear-cut answers. The UN Special Rapporteur on the right to privacy recommends formal consultation mechanisms be instituted "including ethics committees, with professional, community and other organizations and citizens to protect against the erosion of rights and identify sound practices" (Cannataci 2017). A recent example in which ethics and moral obligations of data handling were included in an official UN document is the "Guidance Note on Big Data for the achievement of the 2030 Agenda" adopted by the UN Development Group (UNDG 2017). The note, the first official document in the UN on big data and privacy, stresses the importance of ensuring that data ethics is included as part of standard operating procedures for data governance (box 3.10).

Box 3.10 Data privacy, ethics, and protection: A guidance note on big data for achievement of the 2030 Agenda

1. LAWFUL, LEGITIMATE AND FAIR USE Data should be obtained, collected, analysed or otherwise used through lawful, legitimate and fair means, taking into account the interests of those individuals whose data is being used.

2. PURPOSE SPECIFICATION, USE LIMITATION AND PURPOSE COMPATIBILITY Any data use must be compatible or otherwise relevant, and not excessive in relation to the purposes for which it was obtained.

3. RISK MITIGATION AND RISKS, HARMS AND BENEFITS ASSESSMENT A risks, harms and benefits assessment that accounts for data protection and data privacy as well as ethics of data use should be conducted before a new or substantially changed use of data (including its purpose) is undertaken.

4. SENSITIVE DATA AND SENSITIVE CONTEXTS Stricter standards of data protection should be employed while obtaining, accessing, collecting, analysing or otherwise using data on vulnerable populations and persons at risk, children and young people or any other data used in sensitive contexts.

5. DATA SECURITY Robust technical and organizational safeguards and procedures should be implemented to ensure data management throughout the data lifecycle and prevent any unauthorized use, disclosure or breach of personal data.

6. DATA RETENTION AND DATA MINIMIZATION Data access, analysis or other use should be kept to the minimum amount necessary to fulfill the purpose of data use.

7. DATA QUALITY All data-related activities should be designed, carried out, reported and documented with an adequate level of quality and transparency.

8. OPEN DATA, TRANSPARENCY AND ACCOUNTABILITY Appropriate governance and accountability mechanisms should be established to monitor compliance with relevant law, including privacy laws and the highest standards of confidentiality, moral and ethical conduct with regard to data use.

9. DUE DILIGENCE FOR THIRD PARTY COLLABORATORS Third party collaborators engaging in data use should act in compliance with relevant laws, including privacy laws as well as the highest standards of confidentiality and moral and ethical conduct.

Data ethics should be treated holistically using a consistent and inclusive framework that considers a diverse set of outcomes instead of an ad hoc approach that only accounts for limited applications. Such mechanisms include codified data ethics principles or codes of conduct, ethical impact assessments, ethical training for researchers, and ethical review boards.

Privacy impact assessments, in general, allow developers and organizations to effectively assess the risks posed to privacy by big data and AI, thereby ensuring compliance with privacy requirements, identifying mitigation measures, and effectively classifying the impacts of data and algorithm use. Including issues of ethics and human rights in any impact assessment, including a privacy impact assessment, could prove more effective than developing a separate analysis or ethical review framework.

For example, UN Global Pulse builds ethical considerations into its data practices by conducting a "risks, harms, and benefits assessment," which may help identify anticipated or actual ethical and human rights issues that may occur during a data innovation project. The assessment considers the proportionality of potential benefits compared to risks of harm from data use, as well as risk of harm from the data not being used. If the risks outweigh the benefits, the project does not proceed. In its "Guide to Personal Data Protection and Privacy," the World Food Programme also builds ethics into its procedures through the application of humanitarian principles and risk assessments.Although ethics may not have clear-cut rules, when assessing the risk of harm along with the benefits "any potential risks and harms should not be excessive in relation to the [likely] positive impacts of data use".

Incorporating privacy by design is also crucial for innovation applications that operate with limited human supervision. The rapidly developing nature of AI algorithms can give rise to algorithmic bias and unverified results. Similar to privacy by design is the concept of AI ethics by design, which suggests seven principles, including recommendations to proactively identify security risks by using tools such as the privacy impact assessment to minimize potential harm. In addition, ensuring oversight of the entire data innovation process, from design to use, is vital to securing true incorporation of ethics into AI system.

Moreover, accountability and transparency are critical ethical principles that must accompany any AI innovation project. "[T]ransparency builds trust in the system, by providing a simple way for the user to understand what the system is doing and why". To maintain transparency, the Institute of Electrical and Electronics Engineers recommends developing new standards that describe measurable, testable levels of transparency so systems can be objectively assessed and the level of compliance can be determined. Although it is harder and harder to keep algorithms transparent because of heavily interlinked and layered processes of algorithmic programming, the AI ethics by design approach suggests that ensuring the transparency and accountability of algorithms is essential to determining the intended outputs and preventing algorithmic bias.

The overall data ethics program may also include recurring data ethics reviews at every critical juncture, such as review boards. A similar approach already exists in research institutions and is usually referred to as internal review boards. For example, in their published procedures for ethical standards regarding data collection, the United Nations Children's Fund (UNICEF) adheres to mechanisms for review such as internal and external review boards as well as the basic ethics training for researchers. Any UNICEF project involving surveys, focus groups, case studies, physical procedures, games, or diet and nutritional studies is subject to ethical review.

A stakeholder-inclusive approach that features "the proactive inclusion of users" is also desirable. "Their interaction will increase trust and overall reliability of these systems". "[T]he context of data use" should also always be considered, thus requiring human intervention, and at times, context-specific expertise – such as the presence of a humanitarian expert during a humanitarian response or of a transportation planning expert in a project that looks at transportation policy.

Finally, ethical approaches to AI should be humanrights-centric, incorporating substantive, procedural, and remedial rights. Just as misuse of AI may lead to harm, nonuse of AI may allow preventable harm to occur. Decisions to use or not use applications of AI can infringe on fundamental rights. As suggested by the UN Special Rapporteur on the right to privacy in his recent report to the UN General Assembly, "commitment to one right should not detract from the importance and protection of another right. Taking rights in conjunction wherever possible is healthier than taking rights in opposition to each other". But undoubtedly, incorporating ethics into every stage of project design and implementation of AI can potentially mitigate harm and maximize positive impact of rapidly developing new technologies, ensuring they are used for social benefit.

A Way Forward: Harnessing Big Data and AI to "Leave No One Behind"

This chapter has detailed a handful of examples of the many innovative applications of big data and AI being used to inform sustainable development and humanitarian work globally (see table 3.1 in particular), illustrating the value of this technology for development actors.

The pervasive nature of big data and the rapidly evolving capabilities of AI hold tremendous promise for social impact and can drive transformation across many domains, ranging from health, to food security, to jobs, and action on climate. Scope therefore exists to expand use of this technology beyond current applications, leveraging big data and AI in new ways that help us achieve the 2030 Agenda. National and international development actors should prioritize operational integration of these digital innovations into policy and practice. Doing so will allow them to craft more agile and responsive programming, to support anticipatory approaches to managing risk, and to find new ways to mea sure social impact. However, mainstream, scaled adoption by policy makers and communities themselves still faces systemic barriers and pervasive inertia.

Given their broad applicability, big data and AI necessitate new forms of interinstitutional relationships to leverage data and computational resources, human talent, and decision-making capacity. The capabilities of a diverse set of stakeholders can enable the integration of data innovation into ongoing policy processes rather than one-time policy decisions.

Moreover, as adoption of big data and AI increases and the technology evolves, so do the potential risks and issues that need to be resolved. Many question the suitable application of this technology, including malicious use, and highlight the risk of unintended consequences in this rapidly evolving field, where policy makers may struggle to keep pace. Although both the supply of and demand for data are expanding at "warp speed," the data ecosystem, as we know it, is still embryonic – with many advanced potential applications still more theory than practice. As new capabilities and data sources are applied for good – whether to create smarter public services, better early warning systems, or more effective responses to crises – development actors must pause to consider the potential for harm that may arise, for example, from inadequate privacy protection.

To date, no standards exist for the anonymization and sharing of insights from big data in priority industries such as financial services, e-commerce, and mobile telecommunications – although the latter has done work to develop such standards. At the same time, as noted, nonuse of these new capabilities and data sources represents at least as great a risk of harm to the public as that potentially arising from inadequate privacy protections. New frameworks are needed that go beyond privacy and ensure accountability and responsible use and reuse of data for the public good. Principles such as responsibility, accuracy, auditability, and fairness should be core concepts that guide the development of algorithms and AI. The "society-in-the-loop" algorithm concept, for example, proposes to embed the "general will" into an algorithmic social contract in which citizens oversee algorithmic decision-making that affects them.

Developing countries may have the most to gain from the use of new data sources and tools. However, without thoughtful application and critical complements they may also stand to lose the most. To reap the societal benefits of AI – including expected improvements to productivity and innovation – countries must have access to the data, tools, and human expertise necessary to support their application, as well as viable plans to address the likely displacement of workers. The availability of data is to a large degree a by-product of digitization, an area in which developing countries lag far behind. There can be no mass digitization without universal and affordable access to broadband. According to International Telecommunication Union (ITU) statistics, some 3.8 billion people, or just over half the world's population, were still lacking access to the internet in 2017.

The way forward must be inclusive. For the big data and AI revolution to benefit the most vulnerable people, current AI research roadmaps must increase attention to methodologies that can work in data-scarce environments, that can be adapted quickly and with few examples – as in crisis scenarios – and that can work with incomplete or missing data (such as "one-shot learning"). Need is also urgent for bridging gender inequalities in big data. More effort must be made to train younger generations, women and men, to ensure gender equality and the inclusiveness of ethnic groups in shaping AI.

As the field of data science accelerates, countries must create robust big data and AI strategies to prevent growing inequalities in access and use of these technologies. In digital advertising, for example, where many of these capabilities were incubated, big data and AI continue to demonstrate their ability to concentrate wealth – and data – in the hands of the few and widen inequalities.

Just as misuse of AI may lead to harm, nonuse of AI may allow preventable harms to occur. The challenge is that misuse of these new tools is already rife online and real harm is being done, while the opportunity cost of failure to use them responsibly is mounting. Clearly, although achievement of the 2030 Agenda and the modernization of humanitarian practices necessitates responsible use of these new tools, it urgently requires a new, rights-centric effort by all stakeholders to ensure innovations meet community needs and no one is left behind. Undoubtedly, assessing the ethical impact of AI in addition to privacy protection measures can mitigate harm, maximize benefit, and lead to use of the new technologies as a force for good.

People and Data

Introduction

How can the data revolution expand economic development opportunities for more people? Can the increasing collection, analysis, and use of data – often from individuals1 through digital transactions or digital records of offline activities – broadly benefit those individuals and people? And what risks might arise, such as to individual privacy, and how might they be managed?

Conversations about data have become very popular: interest over time in "big data," as indicated by Google searches, for instance, has grown one-hundred-fold since 2010.2 More data is being generated – by people and machines – and captured, processed, and transferred than ever before. Much of this is because of the increasing use of digital technologies by people and organizations globally; indeed, even most analog processes have digital components (such as a visit to a doctor's office leading to a digital drug prescription).

But while the data revolution can benefit people, this chapter proposes that the structure of data markets might be raising risks and costs to individuals. People bear many of the costs and risks of participating in data markets and, indeed, might not even be aware they are participating. The poorest also face entry barriers, and it is possible that they might not benefit from their participation even when it is possible.

The benefits from data include – at the most general level – the ability of data users to make better decisions using the information processed and to enjoy more convenience when interacting with organizations (for instance, easier interchange of data between platforms or service providers). The increasing use of such data by organizations, such as businesses and governments, implies the potential for faster and better decisions by these entities. This can help them improve service delivery, reduce costs and prices, or support process or product innovation, all of which would benefit the people that those organizations serve.

For example, better techniques for tracking how and where people drive their cars can inform traffic planning and management. Data from people's online activities inform advertising decisions that fund the operation of many widely used internet services that are "free" at the point of use. And as digital tools proliferate, individuals are increasingly able to benefit directly from access to more and new types of data and the information derived from it. People can take steps to increase their physical activity and improve their health by using digital pedometers, which have now become available on many smartphones apart from watch-like activity trackers. They can analyze market trends and make more informed choices about the products or services they buy, for instance, when buying books or purchasing air tickets. And depending on the organizations that use that data, people may benefit indirectly by being better 52 Information and Communications for Development 2018 able to navigate the organization's products and services and through an expanded set of choices or opportunities, based on analysis of the preferences exhibited by collating data on consumer and web traffic choices.

The possible costs to the user of data collection include the loss of privacy, of agency, and of control. Such costs can undermine people's trust in the organizations that collect, control, and use data. Indeed, at the time of writing, various controversies had broken out over data leaks compromising the privacy of personal data and the biases involved in the use of data to profile individuals; these have underscored the risks emerging in the new data-rich economy. These costs are not always apparent or are distributed in biased ways among participants in data markets. This is because of how those markets have been evolving, with some organizations gaining significant power in defining how such data is collected, used, and shared. Other risks are emerging in this era of data: because of barriers that prevent people – especially the poor – from participating effectively in data markets and analog limitations to the benefits of the data revolution.

The chapter considers several aspects of personal data markets, which run on the personally identifiable data that people generate (figure 4.1 reviews the types of personal data). It looks at how data markets have evolved, highlights the various players in the data market, and then discusses the benefits and costs for participants in data marketplaces through digital networks and how negative impacts might be reduced. The chapter concludes with a discussion of public policies that could rebalance the costs and benefits to ensure fairer distribution among participants and understand how data marketplaces can focus more on people. These choices could determine whether data will help people – especially the poor – find economic opportunity. Few best practices exist as models, and hence the chapter will leave the reader not with specific policy prescriptions, but with a better sense of the dynamics at play.

The Data Market

Technological change and evolving business models

Personal data is generated through an individual's actions (such as making a payment using a credit card), through business processes that digitize analog data (such as medical histories), or through consequent machine response (such as call data records). Such data is now increasingly coming from use of the internet, wireless sensors, and the billions of mobile phones around the world. As the world gets more connected, more people are leaving a digital trail, wherever they go and whatever they do.

This data, which has become more voluminous and granular over time, piqued the interest of various organizations that saw the financial value embedded in it. By the early 1990s, personal data such as telephone numbers and email addresses was widely used for marketing. Companies crunched data to predict how likely people would be to buy a product, and began using that knowledge to come up with targeted marketing messages. As more digital data was collected, organizations began to use increasingly powerful computing tools to manipulate and apply that data. Marketing companies built richer consumer profiles to predict future purchases and manufacturing and services companies to design and model new products.

Companies are now using such data to develop services powered by artificial intelligence (AI), and the bigger the data set, the better the AI. These and other innovations have greatly increased the value of data and its potential for being monetized, or bought and sold as a product in its own right. Data continues to gain value as its potential uses increase. Organizations – including businesses, governments, and others – can derive value from data by applying the insights arising from data's analysis to internal cost and revenue optimization, marketing and advertising, intelligence and surveillance, and automation.

The application of personal data for online advertising has skyrocketed, with the internet now surpassing television as the leading advertising channel. At a forecast US$237 billion in 2018, digital ads are expected to grow from 44 percent of global advertising revenues in 2018 to more than 50 percent by 2020. Facebook and Google accounted for 84 percent of digital advertising revenue in 2017 (excluding China). In 2016, Facebook's advertising revenues were US$27 billion (up more than 1,300 percent since 2010), accounting for more than 97 percent of its total revenues. Google's advertising revenues – US$79 billion (growing 180 percent since 2010) – accounted for 88 percent of its total revenue. Combined, the advertising revenues of these two online platforms were on par with the gross domestic product (GDP) of Morocco.

Data market actors

Table 4.1, complemented by figure 4.1, identifies the main types of actors operating in the markets built around personal data and the relationships among them. Using these categories of participants, it is possible to illustrate a simple model of the data market, as shown in figure 4.2. People produce personal data, the "raw material," which they "sell" (traditionally at zero price) to other market players who then use that data to derive various benefits. Individuals also provide "free labor" on many of these online platforms – by creating content such as posts and reviews and by uploading photos and videos – that data collectors can "scrape" – extracting data from online sources – to infer personal traits and preferences. This personal data, along with the data that individuals generate from their activities, and that might be inferred from their data (such as their political or culinary preferences), is the main source of data for these organizations.

People do not always directly derive value or benefits from this data (until recently, as discussed later). But people have been deriving indirect benefits from their sale of data in services or products that data-using organizations provide. These benefits are discussed in the next section.

On the other side of the market, the "buyers" are the various organizations that collect and use the data. In some cases, these organizations depend on the data as a necessary input for their operating model, as do online social media, search engine websites, and various information and news sites. Using advertising as a source of revenue, they typically compensate people – producers of the data – with free or highly subsidized access to their services. Hence, data has financial value to those organizations, either immediately when it is sold to other organizations (such as marketing companies) or through the services that an organization offers others (such as a search engine selling advertisements tied to search terms).

In other cases, data is an input into an operating model. Health systems or government services are one example. Their processes are traditionally standardized and have relied in the past on highly abstracted models of user preferences. Data is thus an input into these systems and does not have an immediate financial value but has informational value. This implies that such services are often performed for a fee, whether paid immediately or separately (such as through taxes). However, these interactions do generate significant amounts of data, and thought has increasingly gone into creating more specialized services and choices based on that data (such as in e-government services) or improving the quality of those abstract models to design improved services (such as better medicines or treatments). Businesses can unlock financial value by generating more effective insights from data to launch a new product, reduce waste or costs, enable better decisions, and boost innovations.

One may say that the value of data depends on how and for what the data is used and how well it is prepared (cleaned and organized). In either case, however, data can find its way to other parties. The regulation of those data flows is the responsibility of data protectors, which can include rules pertaining to privacy and sharing of specific types of data (such as health or financial data), as well as rules about electronic transactions. Data could also be held as an "asset" by those who collect it directly or via others, and new rule systems have emerged around concepts such as the "right to be forgotten" by such entities. Hence, data regulators can also protect people by defining and enforcing rules around the use of their data.

In this construction of the generic data market, organizations have an opportunity to capture the value from the data people produce, and they can determine how much of this value returns to those people. As noted, this data has significant financial and political value since it contains information on behavior and preferences. Where people do provide such data voluntarily, it is because they expect to gain some of those benefits – whether it is access to online services or better medical care, or merely the chance to win a competition.

Questions then emerge from the perspective of data producers: are people aware of what data they are providing and under what conditions (or at what cost)? Do they understand the value (monetary and otherwise) of the benefits they receive? Are they able to assign value to the data they provide in a manner that explicitly differentiates between their perception of value and the actual value of the benefits that they already have or could receive? And what might ensure that maximum benefits are delivered to those who produce the data? The following section unpacks the benefits and costs that accrue to individuals as they participate in these data markets as data producers.

Table 4.1 Typology of actors in the personal data market

Actor	Description	Examples
Data producers	Personal data is generated by individuals as they fill in forms (either online or offline where the latter is digitized), through sensors (such as fitness trackers and home monitors), through using applications and services on mobile phones and the internet, through using credit cards, and from being captured by security cameras and other sensors.	People generate data anonymously through sensors, security cameras, and the like. Individuals generate data using mobile phones, credit cards, internet search, fitness trackers, and so on. In some cases, civil society organizations can help produce data, especially among poorer communities.
Data collectors	Companies and governments collect data in different ways. Businesses collect personal information from their individual customers. Similarly, governments collect data from citizens for a wide range of purposes.	A bank asking for financial and personal information. An internet service provider recording web sites a user has visited. An information services company soliciting personal information for an individual to open an email or social media account. Citizens providing birth, marriage, and death details to governments for civil registries.
Data aggregators (brokers)	Obtain personal data from public and private parties to combine for resale to businesses. Some add additional value through analytics.	Data mining companies, such as Acxiom or notoriously Cambridge Analytica, that collect information from sources such as public records and consumer surveys to provide insights for clients such as banks, car companies, and retailers.
Data users	Businesses who purchase data aggregators' products. Users of the analyzed data can also play the role of data collectors and aggregators on the market.	Businesses and governments for law enforcement. Alphabet (Google's parent company), Amazon, Facebook, and others are data collectors, data aggregators, and users of the analyzed data. Advertisers are major users of personal data to better target online ads.
Open data providers	Prepare (for example, anonymize) and make relevant personal data open to use and redistribute.	National governments, affiliated agencies, or organizations (such as civil society groups).
Data protectors	Address privacy and control of personal data. Protect the interests of individuals that have generated that data or its derivatives.	National data protection authorities through privacy and computer crime legislation. Companies offer products that provide data security, stronger data protection, or information about personal data that is collected. Many tips are available for protecting personal data. However, the decision to provide personal data in exchange for use of some services is still up to the user.

Figure 4.2 The personal data market

The Benefits, Costs, and Risks for People

As noted above, the main benefits from the data revolution arise from the information value that personal data can provide to either individuals or to organizations that serve those individuals and from the financial value that it has for organizations. Yet costs and risks to individuals exist in the era of expanding collection, flow, and use of personal data. These include, as noted, privacy, loss of agency or control, and risk of exclusion from benefiting from data's value. This section notes that people are not always aware of the costs or the benefits of their participation in the data marketplace and, even if aware, might be constrained in their ability to improve the tradeoff due to the structure of the market.

Benefits

The data revolution has given more people access to information they can use to make better decisions. This is, first, because people can use data for its informational value – either directly or through the organizations that serve them – exposing them to new information or by creating new services or products, both of which help them make or realize better decisions. Second, it is because their personal data has financial value, implicit or explicit, that allows them to exchange (sell or barter) personal data for services or products they might otherwise have had to pay for. Often this includes a range of sophisticated online tools that allow them to be better informed about services or reduce transaction costs (including information sources such as search engines or communication tools such as email services). Table 4.2 summarizes the two forms of value and provides examples of how they operate in the data market.

Table 4.2 Benefits from personal data to individual

Data holds	Informational value	Informational value
	Information is derived from the data people produce, which could inform decision-making.	People produce data that has financial value for some other party and exchange their data for products or services.
Effects	Direct: Derived when people use their own or others' data to make decisions (such as exercise data from a wearable activity tracker or reviews on a shopping portal).	Direct: Derived when people share their data (knowingly or otherwise) with organizations in return for services (for example, people provide data in return for access to information services or social networks online); those services are financed through the sale of the data or its derivates.
Effects	Indirect: People's data goes to organizations (for example, health care companies, urban planners, financial institutions, news organizations) that use it to improve or subsidize their products.	Indirect: People provide data that collectors use or sell on to others, generating economic value that could return to individuals through ower prices or income-generating opportunities, or feed into broader economic processes, which could also include innovations that benefit the wider public.
Benefits	Better decisions Innovative products Improvement in public services	Access to digital services Wider economic benefits for data users that could spill over into opportunities for data producers

Benefits due to informational value

When data is organized and analyzed it creates information, which can be an essential input in economic decision-making and security; it influences resource allocation, choices about technologies, and political choices and informs them about the markets that they participate in. When farmers have access to market pricing information, they can make better choices about when and where to sell their produce. Similarly, when consumers have better information about the supply, quality, and price of goods or services, they can make better choices about where and when to buy them. When data from weather monitoring systems feeds into complex models and informs governments, businesses, and individuals about potential inclement weather, it allows each to take measures to minimize or respond to damage. When civilians have better information on the events taking place around them and the decisions their political representatives are making, they can make better decisions about where to live, how to get around, how to spend leisure time, and how to vote. And when young people have better information about careers and wages, they can make better choices about what they study.

The data revolution is giving more people increasingly diversified and context-specific information through improvements in data collection, processing, analysis, and distribution, online and offline.8 Thus far, people have typically benefited indirectly from data, as when organizations that collect, process, and use data to make decisions or inferences about people's demands or interests then provide new or better information or expand the set of opportunities available to individuals. For instance – continuing from a previous example – this happens when governments improve disaster preparedness or response or insurers process claims faster. And when people share their personal data with many of today's online services providers, those companies can attract advertisers, giving more people access to many sophisticated digital tools, from financial planners to cloud-based storage, often free.

Organizations can use personal data for innovation in processes, products, and services. These innovations could lead to economic benefits for people through lower prices and a better match between products and consumer needs. For example, TrueCar collects and analyzes individual transaction data to provide an idea of local vehicle-specific prices so that car buyers know what others have paid for the same car. And various companies are using personal data to design more engaging or useful products and services. In health care, data collected from large groups of individuals is improving diagnoses and helping to identify treatment options.

Personal data is being used to improve public service delivery, enhance policy making, strengthen citizen participation, and enhance security. For instance, New York City is planning to use data from devices installed in taxis that use GPS, as well as pick-up and drop-off data from ride-sharing apps, to improve traffic management, identify roads that need to be fixed, and determine where to focus efforts after inclement weather. Similarly, in Seoul, the capital of the Republic of Korea, the location of mobile calls and text messages is used to optimize night bus routes.

One popular application gaining use around the world is the use of locational information from smartphones to report problems with local services. Not only does this pinpoint the exact location of annoyances, such as uncollected garbage, potholes, or graffiti, it can also help foster citizen engagement. Social media activity can be "scraped" to alert vulnerable populations, such as informing Brazilian Facebook users about the Zika disease. When personal data is used in ways that improve welfare, people will again be open to sharing it with public agencies and other organizations.

And people now are increasingly able to benefit from such data directly, using a wider range of progressively sophisticated tools to process data and derive their own conclusions. This includes making personal finance decisions or modifying health-related behavior.

Benefits due to financial value

Personal data has financial value, mainly placed on it by organizations that use that data to market products and services to their customers. This financial benefit is typically not available to the individuals that produce the data, but those organizations could "pay back" the producers of data directly by providing them with access to additional digital services, or indirectly through wider economic benefits, such as the availability of credit ratings allowing access to credit. For example, most of those online provide personal information for access to advertiser-sponsored digital applications such as search engines, storage, email, and social media. And there is significant personal-data-driven advertising sponsored content online, such as news, health, and education sites of importance to individuals.

Personal data also supports a vast ecosystem of digital companies (see chapter 5), and is beginning to influence firms outside the traditional digital sectors as well. The growth of such businesses – fueled by data – implies economic growth that in turn will benefit individuals. The large information technology and services companies that use and benefit from personal data have created thousands of direct and indirect jobs, for example, and have created platforms that have led to the creation of other businesses. Not all are positive developments, with opportunities for some to generate fake data, for instance (see box 4.1).

Box 4.1 Income Generating Opportunities

Some people benefit financially and directly from their ability to earn revenue from the data economy. This includes a handful of services that provide money (or discount coupons) in exchange for personal information.

People can set up websites and receive income from personal-data-driven advertising tools such as Google's AdSense. Freelancers can earn money from jobs in data-related areas on Mechanical Turk (www.mturk.com); Upwork a freelance broker reported that jobs associated with data and artificial intelligence were among the fastest growing in the fourth quarter of 2017.

And potential could exist for outsourcing analytics projects; a data scientist in India, for example, reported earning US$200 an hour for overseas jobs. But, while individuals could get a financial return for their own personal data, they might also do so with false data. Income can be made from ethically questionable activities such as using fake accounts or reviews to influence social media. For instance, the #richkidsofinstangram handle was used by social media influencers to attract unwitting users to invest in dubious online trading schemes.a Estimates of fake accounts – also created by governments and criminals – range from almost 50 million for Twitter to about 60 million for Facebook.

Costs and risks

Despite the potential benefits of the data revolution, people, and especially the poor, are often subjected to many costs and risks or even have been largely dependent on the organizations that collect or use their personal data as gatekeepers to realize the benefits of those data, and to act on their decisions. The costs and risks stem from two issues involving these organizations: first, the limitations in how the analog world permits people to benefit from the data revolution, and second, the unequal power relationships between people and these organizations. The first can be discussed briefly, as its resolution requires a shift beyond the data economy itself; the focus instead will be on the second precluded from partaking of the benefits described above. This stems mainly from how individuals

Risks arising in the analog world

A key risk in the data economy is that missing analog complements, such as limited literacy, can constrain the extent to which people can realize benefits from digital data markets. For instance, if organizations do not function well or are in uncompetitive markets, the collection of more data might not improve flows of information or decision-making by individuals, nor will it create incentives to deliver the expected benefits. In such a market, people may perceive the value of their data as low, because of information asymmetries, and many may give up their data unknowingly or without expecting an appropriate return.

The poor also face risk of exclusion: the barriers to entry in data markets are often too high for them, as they do not have access to digital technologies or they lack the skills to use data and convert it into relevant or useful information. Although the use of new technologies has exploded across the globe in the past 10 years, the price to access this data is still prohibitive for many. In Bolivia, Honduras, and Nica ragua, for example, a mobile broadband subscription exceeds 10 percent of average monthly GDP per capita, compared with France and Korea, where it is less than 0.1 percent (see figure 5.6). Many people – especially women, people living in the 40 percent of the population with the lowest incomes, or people with disabilities – lack the digital tools or literacy to use technology. People who over-share online data concerning their sexuality, eating and drinking habits, or their taste for high-risk sports may be unwittingly excluding themselves from insurance coverage, or at least raising their premiums.

One consequence is that digitally excluded populations increasingly risk exclusion from data sets created from mining digitally generated information that might be used to enhance their livelihoods. And this makes many developing countries "data poor" themselves; that is, they have substandard data on a population, with entire groups of people invisible, such as unemployed women, indigenous populations, or slum dwellers.

The poor also often face constraints on how they use data – even if they are aware the data exists. This is because growing data flows often do not reach them, due to weak institutions or constraints on the functioning of markets. For example, if government weather data is not made public quickly, it will not benefit them. Or if disaster preparedness and response systems are not in place or fail to operate because alerts do not reach people quickly, even having that data will not expand opportunities in a way that would allow most to benefit from them.

Costs and risks arising from the data market(s) status quo

Costs are embedded in how data is shared and consumed, because of the structure of the markets in which the data is used. These costs might not be transparently disclosed to individuals (data producers), or they might have unintended consequences for the way that the data market functions. Several costs can be identified: loss of privacy, loss of control, loss of agency. When these costs are disclosed or uncovered (especially unintentionally, such as through data leakage, or deliberately through hacking), they could undermine the functioning of the digital ecosystem supported by data markets due to a loss of trust in the participants in those markets.

Concerns about privacy have been central to discussions about the data economy. Cases exist in which people may provide personal data willingly – to government officials, health care workers, or marketers. For example, they might trade it, knowingly or unknowingly, for access to online information services. Collection of such data allows data-driven services to improve. But this may mean people lose some privacy willingly. And it has also made securing and protecting personal data increasingly important for all kinds of organizations, both data collectors and users. Incidents in 2017 and 2018 have shown that the personal data of millions of people could be accessed – legally, accidentally, or illegally – including through means that neither the individuals nor data collectors might have been aware of. Most notable is the use of data collected through personality profile surveys on Facebook for targeted political advertising campaigns.

Much personal data, held and used as it is in financial, health, or public services organizations, is sensitive, and privacy has therefore been recognized as a fundamental human right deserving protection. Loss of privacy risks becoming a negative influence on the behavior of others or organizations, such as through exclusion of people from access to services, social threats (bullying and stalking), or employment hiring or firing decisions.

Transparency about what data is being collected, from whom, and about how it could be used is critical. However, much of the data people generate is now automatically created through their actions and often does not request explicit permission for collection or sharing with others (beyond accepting terms and conditions, often wordy and complicated). Because digital data is effectively permanent and can be replicated infinitely, its use can extend far beyond what was earlier possible with analog records. Such loss of control occurs as people give data away unwillingly or unknowingly, and, hence, lose control over it, are not aware of how or when it will be used or by whom, and are unable to engage in its secondary use.

One example is Meitu, a photo-enhancing app that requests access to far more data than needed, such as GPS location, cell carrier information, Wi-Fi connection data, SIM card information, and personal identifiers that could be used to track people's devices and sell the data without them realizing it. Users have control over whether to use an application or not as well as to adjust privacy settings within applications, but the configurations can be complicated or unwittingly bypassed. Often, individuals are unable to deny an organization control of their data, sometimes exclusively, without giving up access to all of its services; no options are available, especially in the online world, where terms and conditions to give up control are frequently "take it or leave it".

Loss of agency happens when algorithms or the input data causes people to lose control over their actions or restrict their ability to determine their own choices. Such loss is reinforced by the development of algorithms that are starting to offer choices to people for everything from what movies to watch, which news sources are relevant, what to buy, or which web pages might offer the information they seek.

Those algorithms are developed based on models of personal preferences, using user data, that are abstractions of individual behavior. Such algorithms may frequently be inaccurate, no matter their sophistication. They model an individual's preferences, discouraging experimentation and reinforcing segmented stereotypes, often hidden from view both in what the sources of data are and how the algorithm itself works. At the time of writing, discussions had grown about how algorithms on some platforms might influence significant choices, such as voting. And even if the more serious of these claims are ultimately unproven, the working of many of these algorithms is not clear (as well as what biases might inadvertently or purposefully exist).

These hidden costs, when they are disclosed, are often then accompanied by significant negative publicity for the organizations involved. This could undermine the provision of such products or digital services – dependent as they are on personal data – because people lose trust in those services. Theft of personal data, its growing accumulation and analysis by companies, and the spread of fake information increasingly targeting specific groups of people lowers trust for governments that people feel are not doing enough to protect them and for companies they feel are misusing their data.

Underlying many of these risks is the imbalanced structure of many data markets. Increasingly, private organizations are holding and using data, and these organizations are not subject to democratic pressure (as many public institutions are), and increasingly are subject to winner-takes-all pressure in network industries. As noted, individuals are often unable to negotiate better terms and conditions related to their data or create better trade-offs between their privacy, control, agency, and access to services. Better informed and targeted regulation is part of the solution, given the collective action problem that occurs when large numbers of people engage with such organizations or networks. The next section discusses other protections that might be needed.

Remedies

Vibrant debate is now ensuing about what public policies could help respond to these failures within and outside the data market, and how regulations may be applied in this sector, which up until now has been largely unregulated. Appropriate policies – helped by emerging technologies – could lead the data revolution to expand economic opportunities for more people. Part of this could be achieved by making the costs and benefits transparent and redistributing them more fairly across different players in the market.

Specific remedies could help address or minimize the risks and costs to individuals arising from the ways data markets function today. Areas that a personal data policy could address include overcoming the identified market failures – loss of privacy, control, and agency; exclusion from participation in the market; and unfair distribution of the market benefits among data market participants.

But little consensus exists for now on what remedies will work, and some approaches are yet to be tested. And current data policies are highly fragmented, with diverging global, regional, and national regulatory approaches. Moreover, these remedies do not directly address the unequal power of individual users versus the organizations (global platforms or states), an underlying issue in data markets. This issue might only be addressed through strong regulatory or large-scale user action; but, again, little consensus exists on how these might be achieved. Table 4.3 and the rest of this section outline emerging responses.

Table 4.3 Risks and Remedies

Risk	Remedy	Example
Loss of privacy	Legal frameworks to protect personal data from theft and misuse, to require consent for collection and use, to keep personal data accurate and relevant (where data subjects can access and correct their personal data), to define how such data can flow (including across borders), and to specify the mechanisms to assist individuals if violations occur	European General Data Protection Regulation (GDPR); APEC Privacy Framework; OECD Privacy Guidelines.
Loss of agency	Informing individuals about when and how data is collected and used, including how their experiences are modified by algorithms based on that and others' data. Allowing users to switch off such algorithms or hold back their data from being used. Clarity about data sources to minimize the risk of fake data or its derivatives influencing decisions.	None, although some companies such as Google do now allow users to "turn off" personalized search results, for example.
Loss of control	Legal frameworks limit the collection of personal data, and limit use and disclosure to specific purposes. Data subjects should be notified about the purpose and disclosure of the data collection and can opt-out of data sharing between the data collector and other companies. They can also choose to be forgotten.	Canadian Personal Information Protection and Electronics Documents Act; European GDPR.
Loss of trust	Reducing personal data breaches, business codes of conduct where regulation is weak or vague, acting on feedback from user communities.	Data Science Code of Professional Conduct.
Exclusion	Connecting people to the better-quality, affordable internet.	Universal technology access programs and digital literacy training.

Privacy

Privacy protections have been typically ensured through legal frameworks. A global survey, reported by UNCTAD shows that data and privacy protection legislation has been put in place in more than 100 economies, 66 developing or transitioning (see map ES.1). More than one-fifth of economies, primarily developing ones, had no legislation, and few have developed comprehensive data protection laws.

Key attributes of such a legal framework include protection of personal data collected by organizations, such as effective and appropriate security to protect the data from theft and misuse. It is also generally accepted that organizations need to keep personal data accurate, relevant, and updated. Data subjects must be able to access and correct their personal data. Widely cited frameworks to define the rules around the privacy of personal data include the European General Data Protection Regulation (GDPR); the APEC Privacy Framework; and the OECD's Privacy Guidelines. The Council of Europe's Convention 108 is a foundational data protection initiative, with a treaty that opened for ratifications in 1981. The treaty intends to "secure in the territory of each Party for every individual, whatever his nationality or residence, respect for his rights and fundamental freedoms, and in particular his right to privacy, with regard to automatic processing of personal data relating to him ('data protection')".

The GDPR, which came into force in May 2018, enables better control over personal data, entitling individual protection of anonymity, pseudonymity, and rights to request and erase personal data ("right to be forgotten"). Another novel feature is data portability, giving individuals the right to request that their data be transferred to another controller and for data controllers to use common formats.

Cross-border personal data flows are also regulated, with onward transmission generally only permitted if the recipient country has adequate data protection laws. Businesses that do not comply with the regulation face significant fines.

The right of an individual to privacy is often balanced with the need to secure the greater public good. For example, even the Council of Europe's Convention 108 permits restrictions in cases when "overriding interests (e.g. State security, defense, etc.) are at stake". In other cases, privacy rules permit irreversibly anonymized data to be used for research or public interest activities. This balances the interests of individuals in safeguarding their privacy with the benefits of being able to use personal data, as described in the preceding sections.

Beyond legal frameworks, however, new approaches are emerging. This helps in areas given institutional capacity limitations, the difficulty in regulating across borders, and the "take-it-or-leave-it" nature of many services. For example, online services that embed privacy into their designs have emerged in messaging or search. A more detailed discussion is found in chapter 6.

Control

To overcome loss of control, collection of personal data should be transparent, and use or disclosure limited to specific purposes. Individuals should be notified about the purpose and disclosure of the data collection. One example is Canada's Personal Information Protection and Electronics Documents Act passed in 2000 (passed by the Privacy Commissioner of Canada). Under the act, individuals have the right to access the information held about them, challenge its accuracy, and give consent for personal information to be collected. Organizations have obligations to ensure data security, limit the data they collect, use personal data only for the purposes consented to by the consumer, and not retain the data when purposes for collection are no longer in effect. The EU's GDPR also enhances individuals' control over personal data by enabling the "right to be forgotten," permitting them to control what personal data is available online or with data users. The rules also allow users to control how personal data is used by those organizations.

Agency

Loss of agency can be averted by educating individuals in data collection methods and in how algorithms modify their experiences based on their data. The Data Privacy Project in New York City trains librarians, in turn, to provide guidance on protecting personal data to the largely vulnerable patrons that utilize libraries' internet services. Some applications allow individuals to switch off predictive algorithms. For example, Google allows its users to delete their past searches or prevent saving of searches or allows users to turn off personalized search results that might create an "echo chamber" for users by limiting their exposure to new sources of information.

Exclusion

Exclusion of individuals from data markets can be overcome in different ways. It is estimated that well over 2 billion people did not use the internet at all in 2016, either because they had no access, could not afford it, or did not know how or want to use it. A significant proportion of these people live in rural areas of developing countries, where levels of internet infrastructure and incomes are often low. Exclusion from the data market can be overcome through introduction of information and communication technology, particularly mobile telephony and the internet, among lower-income groups and connection of more people through inexpensive phones.

Governments need equally to tackle the challenge of people who have the needed infrastructure within reach but do not use the internet because they lack digital literacy. This could be done through creation of awareness about data-driven services (such as social networks, public services, search engines), as the Indian government's Digital India Program of 2015 does. The program helps farmers get access to information about different wholesale markets in their community through digital apps on smartphones and helped cut out middlemen (see Reuters Market Light 2015). Farmers can use this information to make better choices and not be beholden to centuries-old systems (Bergvinson 2017). By the end of 2015 the program had already helped increase farmers' incomes 5–25 percent.

Trust – and the dominance of digital platforms

During the writing of this report, many episodes underscored the scale of the personal data economy, but also undermined the trust that people have in the organizations that have grown significantly in the data market. These episodes have included massive leaks of personal data, discovery of unapproved access to private data, attempts at manipulation of ostensibly neutral information sources, and sharing of personal information. The scale of these episodes is significant, given the reach and popularity of the organizations and platforms that they involve, such as Experian or Facebook.

Debate about the implications of these episodes is only just beginning, and focusing on privacy of personal data, control over who accesses and uses people's data, and the agency of users. In one account, the organizations involved in transgressions might have been unaware themselves of the potential for trouble or unable to prevent it. But such accounts do little to shore up trust in these services. Even so, the scale of organizations' networks and their importance might lead people to continue using them, even if less willingly.

It might be possible to instill greater trust through actions to remedy some of these other risks. It might also be possible to seek ways to manage data more collaboratively, for instance, adopting a code of conduct (such as the Data Science Code of Professional Conduct of the Data Science Association), and with more transparency, in how data is managed and used. As the next section discusses, this may involve moving toward a more balanced personal data market in which users regain control over their data.

Toward a More Balanced Data Market

Emerging trends suggest new opportunities for individuals to regain control of their personal data, giving people more power as actors in the data market. People are looking for ways to keep their data secure and to monetize it and to get better value in exchange for the personal information they provide. Newer business models – driven by technological advances and people's greater awareness of the transactions and value of data markets – are prompting creation of a more balanced market for personal data. However, scope remains for greater coordination or even aggregation of data streams and sources to maximize value.

Emerging business models allow people to control and directly sell their data to businesses. Companies such as Datacoup30 enable users to sell their personal data for a monthly fee, for example, data generated through social media activity and credit card transactions. Another example of this is Alphabet's Project Baseline, which collects laboratory results and real-time health data from individuals wearing a special wristband. Participants in the study share their health data for two years and receive US$410 per annual visit, US$30 per visit for quarterly assessments, and US$10 for filling in questionnaires.

Businesses are finding that their customers are becoming more informed about the use of their data and the potential monetary value of it, and expect value in return for data used to target marketing and for data sold to third parties. Companies may also begin to find that they lose customers when they fail to keep data secure; however, the winner-takes-all nature of many of the platforms and services in use today might mean that an exodus might not occur often or easily.

For individuals, the biggest benefit is regaining control of personal data. A second gain could be more accurate data, as individuals would have a greater incentive to keep it up to date to better monetize it. This protects people in instances in which out-of-date information might be used against them (such as applying for loans or insurance). More comprehensive information could also expand the scope of applications and services. Third, personal information would be centralized and simplified using personal data management software. Individuals would have fewer passwords to keep track of.

Thus, it should be possible for people to act as dataproducing entrepreneurs – having a data profile, personal data management software, and an online wallet – and exchange the data for money, discount coupons, or free applications and services. The World Economic Forum has proposed the concept of a data bank account, in which an individual's data would "reside in an account where it would be controlled, managed, exchanged and accounted for".

One challenge lies in determining the value of personal data. In Italy, a team of researchers monitored a study group that auctioned off smartphone data for two months, with the median bid across all data categories of €2 (US$2.72). One individual sold his personal data on a crowdsourcing site for US$2–US$200 (depending on the amount and frequency of the information), earning US$2,733 from 213 backers in one month, or an average of US$12.83 per backer. Another study uses operating metrics from Experian and Facebook, companies whose revenues are largely generated from personal data, finding that the average revenue per user of both was about US$6 a year (Roosendaal, van Lieshout, and van Veenstra 2014). Another perspective on personal data valuation is total global digital advertising revenue (US$178 billion; see Magna Global 2017) divided by the number of internet users around the world (3.4 billion), for an average of US$53 in 2016.

Personal data does not have a uniform value and varies according to several variables, such as type of information and income of user. Data from Facebook confirms the latter, with the company having different average revenue per user depending on the region. In the end, the value of personal data will be determined by what purchasers are willing to pay. This will become more apparent with the emergence of global, regional, and national markets for personal data, in which data collectors would review the data available and purchase directly from individuals or third parties they have entrusted the data to. Personal data management software that individuals can operate themselves or where firms act as trusted custodians for users who lack the skills are already on the market.

It is certain that large internet companies will resist individuals' greater control over personal data. Collecting personal data is at the center of these companies' business models, driven by the willingness of individuals to sacrifice personal data for unpaid services. Developed and developing countries also appear to be split over the threat to businesses of individuals monetizing their personal data. In developed nations, it is less of a threat to businesses, with bigger worries in government regulation, cyberattacks, and personal data protection applications. But in Brazil, China, and India, individuals charging for their personal data is among the top business concerns.

However, the unbalanced personal data market could lead to greater disenfranchisement among individuals. This in turn could lead a growing number of individuals to opt out of the existing arrangement.

Tools are already available that give people greater control over their personal information. For example, a Swedish company claims, in a few clicks, to be able to find and delete accounts created using Gmail. Stricter policies about sharing personal information are available with free email, office applications, and browsers. Scope also exists for paid tools with tighter privacy controls, as users might pay for applications and services that protect different types of personal information. One study found that individuals in the United States would pay most to protect government identification, those in India for credit card information, and in Germany and the United Kingdom, for medical records. Products also exist for individuals to protect at least some personal information from internet service providers. In addition, cookie controls and ad blockers will allow users to block online marketing generated from their personal information.

Some individuals have consciously decided to restrict sharing of personal data. Ironically, many people involved in the social media or technology industries limit their use of these services or systems because of concerns about psychological and other dangers caused by services using their personal data. These trends may initially lack the scale of the large internet companies, but could grow as more individuals weigh the tradeoff between sacrificing personal data for unpaid services.

Looking to the Future

It is possible that the future would lead to greater democratization of access and use of personal data. This could lead to more data sharing and, eventually, transform individuals from consumers of data to both consumers and suppliers of data. As data suppliers, individuals would be able price the data they produce and share with businesses or governments.

The market power of private organizations, especially data collecting platforms and networks, possibly, could also be moderated. This is possible through the emergence of competition (such as new social networks or online service providers), regulation by governments or the platforms themselves, and shifts in consumer preferences en masse, which could privilege privacy or control over access. Any such shift will emerge out of negotiation among market players, but should ideally seek to balance innovation by these firms with respect for individuals' rights.

Recent trends are shifting the value distribution to the producers of data, in terms of such things as better health care (cancer research, medical treatment, or diagnosis) and better public services (such as traffic and road planning, water planning). Technologies such as micropayments may also lead to innovation in this area, as noted, possibly allowing people to directly sell their data to businesses and governments in the future. The rise of AI and the Internet of Things will help individuals trade personal data and receive personalized services based on personal data.

Apart from the technological aspects and drivers of this change, the personal data market that may emerge would 64 Information and Communications for Development 2018 benefit from more consistent structuring and organizing of data. This is because such organizations – focusing on the benefits to people rather than to organizations alone – could help aggregate or combine data across platforms and permit portability.

Already, data can be accumulated and cross-referenced across various financial services and platforms to detect opportunities to maximize returns on investment. For instance, online personal finance tools have begun to link people's bank, securities, retirement, and credit card accounts to provide ideas and offer products or services to budget better, increase access to credit, or identify investment opportunities.

But we could go further: linking personal data about physical movement collected through phone location or health tracking, combined with data about transportation use, could be combined to provide people looking to exercise with ideas about adopting a routine that increases walking. Shopping patterns across various stores could be combined to provide better choices or insight to people about ways to save by changing the locations or timing of their purchases. The potential opportunities to merge data sources and improve decision-making holds promise, again, with the caveat that the costs and risks need to be managed.

People as a focus for data markets

The data revolution holds great promise. When better data is available to people, they can make better decisions and find the information needed to improve their economic and social lives. The technological tools to realize these benefits exist today and will develop further. As more people connect to the internet and new ways of collecting, managing, and analyzing personal data become commonplace, more people, including the poor, will participate in the growing data economy.

But these changes will not come without their risks and costs. Without measures in place to protect privacy, agency, and control over data, the risk is that businesses and organizations will benefit the most and few of these improved opportunities will pass on to individuals generating these vast troves of data. If the data economy does not become more inclusive, with wider access to the digital tools and the skills to use them, it is likely that the data economy will not benefit the poor.

Finally, as noted, better data will only go so far to improve opportunity; institutions, infrastructure, and rules will need to be in place to ensure that people can use the information generated through the exponentially growing streams of data. The digital data revolution might be upon us, but people will also need reform in the analog world to effect real change in their lives.

Using Data Responsibly

Description

Table of contents

Better Data for Doing Good: Responsible Use of Big Data and Artificial Intelligence

Introduction

The Big Data Revolution

The Evolution of Artificial Intelligence

Using Big Data and AI as a Force for Social Good

Speech and audio processing

Computer vision, image analysis, and geospatial data

Text mining and text analysis

From Design to Responsible Use: Ethical Challenges with Using Big Data and AI

A Way Forward: Harnessing Big Data and AI to "Leave No One Behind"

People and Data

Introduction

The Data Market

Technological change and evolving business models

Data market actors

The Benefits, Costs, and Risks for People

Benefits

Benefits due to informational value

Benefits due to financial value

Costs and risks

Risks arising in the analog world

Costs and risks arising from the data market(s) status quo

Remedies

Privacy

Control

Agency

Agency

Exclusion

Trust ­– and the dominance of digital platforms

Toward a More Balanced Data Market

Looking to the Future

People as a focus for data markets

Trust – and the dominance of digital platforms