Better Data for Doing Good: Responsible Use of Big Data and Artificial Intelligence

Using Big Data and AI as a Force for Social Good

AI and big data are generating new tools and applications creating actionable insights, real-time awareness, and predictive analysis on numerous topics for sustainable development and humanitarian action. More and more compelling examples illustrate the value of this technology to improve early warning systems and inform policy and programmatic response. These individual use cases represent a small but significant innovation in learning about the world around us. Taken together, they provide new ways to detect and respond to world events, influence policy debates, and drive development, in a way that is both safe and fair (figure 3.1, table 3.1). 

The following sections examine the benefits and applications of big data and AI ­– including for (a) speech and audio processing, (b) image recognition and geospatial analysis, and (c) text analysis. They also describe how AI is being leveraged to support the SDGs and address the emerging challenges and risks that accompany the uptake of these technologies.

Figure 3.1 The Sustainable Development Goals


Table 3.1 Examples of artificial intelligence applications for the Sustainable Development Goals

SDGs Value of artificial intelligence Case study Risks and challenges
SDG 1:
No poverty
Artificial intelligence
(AI) can be used
to monitor income
and track policies to
identify progress and
successful practices.

Combining satellite imagery and machine learning
to predict poverty in Nigeria, Tanzania, Uganda,
Malawi, and Rwanda

Jean et al. combined nighttime maps with high-
resolution daytime satellite images to obtain estimates
of household consumption and assets. Using survey
and satellite data from five African countries ­– Malawi,
Nigeria, Rwanda, Tanzania, and Uganda ­– the study
showed how a convolutional neural network can be
trained to identify image features that can explain up
to 75 percent of the variation in local-level economic
outcomes.
There is a risk of omitting
segments of the population that
cannot be captured by remote
sensing signatures because of
their lack of footprint or the given
sociocultural context.
SDG 2:
Zero hunger
AI can be used to
maximize yields and
improve agricultural
practices based on
multiple data sources.

Detecting patterns in big data saves Colombian rice
farmers huge losses

A project run by the International Center for Tropical
Agriculture mined 10 years of weather and crop data
to understand how climatic variation affects rice yields.
The project fed the patterns into a computer model and
predicted a drought in the region of Córdoba. The center
subsequently advised the Rice Producers Federation of
Colombia (FEDEARROZ) against planting in the first of
two annual growing seasons. This advice saved farmers
from incurring significant losses.
Overexploitation, based on
local optimization, could lead
to exhausted lands and lack of
resources at the systemic level
SDG 3:
Good health
and well-being
AI can be used to
support diagnosis and
personalized medical treatment.

Revolutionizing personalized medicine using AI

Watson, IBM's "cognitive computing" platform
uses natural language processing to efficiently and
quickly sort through millions of journal articles,
government listings of clinical trials, and other
existing data sources to help diagnose patients and
provide personalized treatment plans. University of
Tokyo doctors reported that the artificial intelligence
diagnosed a 60-year-old woman's rare form
of leukemia that had been incorrectly identified months
earlier in less than 10 minutes.
Overpersonalized medicine could
lead to abuse from the insurance
industry and other stakeholders
based on private personal information.
SDG 4:
Quality education
AI can be used to
tailor the delivery of
education based on
each student's needs
and capabilities.

Detecting dyslexia in children in Spain

Ten percent of the population has dyslexia, a
neurological learning disability that affects reading
and writing but does not affect general intelligence.
Children with dyslexia can learn coping strategies
to deal with its negative effects. Unfortunately, in
most cases dyslexia is detected too late for effective
intervention. Change Dyslexia is a project that uses
cutting edge scientifically based computer games,
such as Dytective Test and DytectiveU, that screen and
support dyslexia at large scale.
There is the danger that harmful
media can be easily accessed by
children. For example, the use of
YouTube Kids videos optimized
with AI and bots that create
long, repetitive, and sometimes
frightening videos meant to keep
children entertained for as long
as possible.
SDG 5:
Gender equality
AI can help correct for
gender bias in insights
derived from big data
and nontraditional data sources.

Mapping indicators of female welfare at high spatial
resolution in Kenya, Nigeria, Tanzania, Bangladesh, and Haiti

A project by Flowminder and WorldPop used
geo-located cluster data from the Demographic and
Health surveys on rates of literacy, stunting, and use
of modern contraception methods to produce high-
resolution spatial gender-disaggregated maps, using
predictive modeling techniques. The study focused on
three countries in Sub-Saharan Africa (Kenya, Nigeria,
and Tanzania), one country in South Asia (Bangladesh),
and one country from the Western hemisphere (Haiti).
AI applications are at risk of
reinforcing existing gender biases
present in the data used to train
the algorithms.
SDG 6:
Clean water
and sanitation
AI can predict
consumption patterns
from sensor data to
optimize water and
sanitation provision.

Monitoring coastal water quality in real time in
Singapore

Project Neptune is a real-time monitoring and prediction
system strategically deployed around Singapore's
coastline. The system integrates hydrodynamic and
water quality modeling into a forecasting framework
that forms the backbone of a central operational
management system. Eight specially outfitted buoys act
as miniature labs, collecting data on pollutants, including
oil and nutrients, and send live updates to the authorities
on how they could spread.
AI (or simple malware) can be
used to attack or disable critical
public infrastructure by means of
remote warfare.
SDG 7:
Affordable and
clean energy
AI can be used
to make existing
infrastructure more
intelligent and energy efficient

Preventing power supply failures in domestic
railway networks in India

Aiming to reduce the risk of signal failure, Indian
Railways has trialed remote condition monitoring of the
power supply systems, leveraging AI to predict possible
outages. The measure is set to be rolled out on two
sections of the Western and South-Western railway
network.
As noted above, critical network
infrastructures may be subject to cybersecurity threats.
SDG 8:
Decent work
and economic growth
AI can be used to
optimize recruitment
for both employers
and jobseekers.

Optimizing online job searches

LinkedIn, a well-known business- and employment-
oriented social networking service, uses AI and big
data to help recruiters automate much of the candidate
screening process. The tool is also integrated in
different applicant tracking systems and, for example,
automatically synchronizes with the different open jobs,
ranking candidates against them.
If algorithms learn hiring practices
based on biased data that
prefers, for example, Caucasian
names rather than others, it can
make biased hiring decisions.
SDG 9:
Industry,
innovation and infrastructure
AI can be used
to automate and
eliminate rote or
routine work, freeing
up labor to focus on
more creative tasks.

Speeding up toy production in Denmark

A factory in Denmark uses autonomous robots and
precision machines to make 36,000 Lego pieces per
minute, or 2.16 million pieces every hour.
AI will transform and could
eliminate some jobs. McKinsey
estimates that some 60 percent
of all jobs will see a third of their
activities automated.
SDG 10:
Reduced
inequalities
AI can support
translation of less-
known languages
to ensure all voices
are accounted for
 in decision-making processes.

Accelerating development in Uganda with speech
recognition technology

UN Global Pulse and the Stellenbosch University
in South Africa used machine learning to develop
speech-to-text technology to filter the content of public
radio broadcasts for less-known languages spoken in
Uganda. Once converted into text, the information can
reveal sentiment around topics relevant for sustainable development.
Advances in robotics and AI
could increase inequality within
societies, further entrenching the
divide between rich and poor.
SDG 11:
Sustainable
cities and communities
AI can measure
traffic in real time,
monitor commuting
statistics, or improve transportation
services.

Inferring commuting statistics in Indonesia
with Twitter

Some estimates for the Greater Jakarta area put the
population at more than 30 million. In response to
the needs of the authorities, UN Global Pulse ­– Pulse
Lab Jakarta initiated a project to test whether location
information from social media on mobile devices could
reveal commuting patterns in the area. The results of
the research confirmed that geo-located tweets have
the potential to fill current information gaps in official commuting statistics.
AI may lead to cascading failures
of interconnected systems in
smart cities. Failures in machine
learning algorithms need to
be accommodated in urban  emergency planning.
SDG 12:
Responsible consumption
and production
AI can improve
efficiency of recycling
processes, which can
eliminate waste and improve yields.

Supporting smart recycling in the United States
with dumpster diving robots

Spider-like robotic arms, guided by cameras and
artificial intelligence, are helping to make municipal
recycling facilities run more efficiently in the United
States. Through deep learning technology, robotic
sorters use a vision system to see the material, AI
to think and identify each item, and a robotic arm
to pick up specific items. The technology could help
make recycling systems more effective and profitable.
AI can also be used to increase
the scale of extractive or
manufacturing industries, creating
a larger environmental footprint over time.
SDG 13:
Climate action
AI and climate science
can help researchers
identify previously
unknown atmospheric
processes and rank climate models.

Predicting road flooding for climate mitigation in
Senegal

Using data from mobile operator Orange, a team
from the Georgia Institute of Technology developed a
framework to improve the resilience of road networks
in Senegal to flooding, including recommendations on
how to prioritize road improvements given a limited
budget. The results showed how roads are being used,
how they are damaged, and how policy makers can
allocate budget in the most efficient way to repair them.
Heavy computation required to
power AI may lead to increased energy costs.
SDG 14:
Life below
water
AI can help detect,
track, and predict the
movement patterns
of vessels engaged in illegal fishing.

Supporting sustainable legal fishing in Indonesia

Indonesia and Global Fishing Watch ­– a partnership
between Google, Oceana, and SkyTruth ­– are
cooperating to deliver a vessel monitoring system
for all Indonesian-flagged fishing vessels and generate
data that is publicly available. The project aims to
promote transparency in the fishing industry.
The data collected might be
incomplete, as some vessels may
be undetectable when switching off their transmitters.
SDG 15:
Life on land
AI can be used to map
and protect wildlife on
land using computer vision systems.

Identifying, counting, and describing wild animals
in camera-trap images in Tanzania

The University of Minnesota Lion Project deployed
225 camera traps, across 1,125 square kilometers,
in Serengeti National Park to evaluate spatial and
temporal dynamics. The cameras accumulated some
99,241 camera-trap days, producing 1.2 million pictures
between 2010 and 2013. Members of the general
public classified these images via a citizen-science
website. The project then applied an algorithm to
aggregate the classifications to investigate multi-
species dynamics in the local ecosystem.
Monitoring technologies can be
used by poachers just as easily
as conservationists.
SDG 16:
Peace, justice
and strong institutions
AI can reduce
discrimination and
corruption and drive
broad access to
e-government.

Turning information into knowledge and action in
Estonia

In Estonia, government services ­– legislation, voting,
education, justice, health care, banking, taxes,
policing, and so on ­– have been digitally linked across
one platform, "wiring up" the nation. Estonia is also
exploring ways to leverage AI to improve e-government
and other public services.
Citizen monitoring could
be misused to repress political
practices (such as voting, demonstrations).
SDG 17: Partnerships for
the Goals
AI should be a public good.

Leveraging partnerships to improve AI for global
good

Multisectoral collaboration is essential for the safe,
ethical, and beneficial development of AI. The
Partnership on AIc represents a collection of companies
and nonprofits that have committed to sharing best
practices and communicating openly about the benefits
and risks of AI research. Another example is the
annual "AI for Good Global Summit"d organized by
the International Telecommunication Union, the UN's
specialized agency for information and communication technologies.
Collaboration must also result in action

Speech and audio processing

Arguably, one major achievement of big data and AI has been to facilitate real-time translation of a growing number of the world's languages. Although language translation is not an SDG per se, greater language and cultural understanding could help increase the efficiency and effectiveness of development efforts across all SDGs ­– for example, by helping to map public opinion (see box 3.3). Google and Microsoft systems, for example, are now able to translate over a hundred languages. Also, new systems have been developed that perform real-time translations ­– such as a Skype system that can translate voice calls into 10 different languages in real time.

Early models of machine translation used statistical methods that translated words based on a short sequence, that is, within the context of several words before and after the target word, which did not always work for long and complex sentences. New neural network architectures, such as long short-term memory, have drastically improved efficiency. Such systems can now learn from millions of examples and are able to translate whole sentences at a time, rather than word by word.

Box 3.3 Using machine learning to analyze radio broadcasts in Uganda

Radio remains a primary source of information for communities in many parts of the world, particularly in remote rural areas where coverage and access to other forms of connectivity is limited. Radio is also an accessible medium for the millions who remain illiterate.

In Uganda, where a majority of the population lives in rural areas, radio is a vibrant platform for community discussion, information sharing, and news broadcasting. Radio talk shows and dial-in discussions are popular forums for voicing local needs, concerns, and opinions.

UN Global Pulse collaborated with Stellenbosch University in South Africa to develop speech-recognition technology to automatically convert these radio broadcasts into text for several of the languages spoken in Uganda, including English, Luganda, Acholi, Lugbara, and Rutooro. "Radio mining" consisted of two automated software stages and two human analysis stages. This semi-automated approach allowed a relatively small team of analysts to process many audio recordings quickly and affordably.

Several projects were piloted with UN partners to understand the value of talk radio to provide information on topics relevant to the Sustainable Development Goals, such as health care service delivery, response to disease outbreaks, and the efficiency of public awareness raising radio campaigns, among others


Computer vision, image analysis, and geospatial data

Accurate population information is critical for authorities to plan and deliver quality public services and coordinate crisis-relief efforts. However, collecting related data traditionally is a long-standing challenge for development practitioners and policy makers. For example, gathering national household survey data on poverty is typically time-consuming and expensive, requiring elaborate data collection and analysis techniques. This exercise is particularly challenging in fragile states, where limited capacity and security concerns typically hinder data collection and processing. In this setting, for example, satellite imagery has been used to gain an overview of population density and assess poverty and access to energy ­– covered by SDG 1 and SDG 7 (see boxes 3.4 and 3.5).

In the health sector ­– covered by SDG 3 ­– current advances in medical imaging and computer analysis of tumors can complement and refine radiologists' analysis. Mobile phone call records have also been combined with satellite data to build dynamic population maps and estimate cross-border flows of migrants to enable development actors to track the spread of disease. This technique was leveraged in southern Africa to map the movements of cross-border communities to better understand malaria infections patterns.

In the environmental field ­– SDGs 12, 13, 14, and 15 ­– AI-assisted analysis of satellite imagery can be used to monitor damage to coastal areas due to floods or typhoons, or drought-affected areas, or the retreat of wetlands or encroaching land use in deltas or river basins. Combined with meteorological models and large data sets on changes in ocean temperature and currents, such mapping can help improve forecasting and early warning systems of future major weather events. Moreover, GPS data has been used to analyze traffic patterns to reduce pollution (see box 3.6). Another AI application getting considerable attention is automated or self-driving cars ­– a potential solution for optimizing transportation in ways that can minimize car accidents. Debate is ongoing about what a fully automated car really is, but considerable progress has been made toward solving problems of visual recognition, object identification, and reaction processing, which are critical to this endeavor.

Building on humble beginnings and minor innovations (including cruise control, assisted steering, lane assist, automatic braking, and "Traffic Jam Assist"), the race toward a fully automated car is now underway (box 3.7).

Box 3.4 Estimating population counts and poverty in Afghanistan and Sudan

In Afghanistan, the United Nations Population Fund and the UN Country Team collaborated with Flowminder, an organization that collects, aggregates, and analyzes anonymous mobile, satellite, and household survey data to generate population maps. The project used survey data, geographic information systems, and satellite imagery data to estimate populations in areas with no such data.

In Sudan, the United Nations Development Programme used satellite data to estimate poverty by studying changing nighttime energy consumption. The team used data pulled from nighttime satellite imagery, analyzing illumination values over two years, in conjunction with electric power consumption data from the national electricity authority. The study was also informed by desk research, including similar World Bank work in Kenya and Rwanda. Electricity consumption was used as a proxy indicator for income, as poorer households were assumed to be lower energy consumers. The exercise demonstrated how satellite imagery can help measure poverty.

Box 3.5 Mapping energy access in India

Satellite night-light data has also been leveraged in India. A team from the University of Michigan, the U.S. National Oceanic and Atmospheric Administration, and the World Bank Group's Energy and Extractives Global Practice analyzed the daily light signatures of more than 600,000 villages from 1993 to 2013 (see map B3.5.1).

Electrification trends were visualized on NightLights.io, an open-source platform for processing big data in a scalable and systematic way. The platform features an application programming interface that enables technical partners to query light output. And its interactive maps allow users to explore light output trends. Through the project, the research team gained a high-level overview of rural electrification, compared villages and plot trends, and shared data, which can help inform government policy.

Map B3.5.1 Night lights in India



Box 3.6 Cleaning Mexico City's air with big data and climate policy

Mexico City's congestion, among the world's worst, worsens local air quality. City dwellers are exposed to twice the recommended level of ozone and fine particulate matter (PM2.5), as advised by national standards and according to 2016 data, resulting in some 10,850 annual deaths. A team of researchers from the University of California, Berkeley, and the Instituto Nacional de Ecología y Cambio Climático in Mexico used data from Waze, a GPS navigation software, to evaluate various transport electrification options based on their ability to reduce urban air pollution and emissions ­– including (a) the electrification of the entire city taxi fleet, (b) the electrification of public transit buses, and (c) the electrification of all light-duty vehicles. 

The team first measured the number, location, and duration of traffic jams throughout the city, estimating related emissions using the MOVES-Mexico model. The team then used data from Google's "popular times" function to map urban population movement.

Using this information, the team was able to identify the best policy options and optimal locations for electric vehicle charging stations.

Box 3.7 Self-driving cars

Human error causes about 90 percent of all car accidents. Artificial intelligence (AI) and autonomous driving might therefore help reduce accidents and save lives. Self-driving cars have to identify, assess, evaluate, and respond to fast-changing circumstances, and predict likely events in real time. A fully automated car has to master vehicle dynamics, control systems, and sensor optimization. For example, detecting pedestrians from images or video is a very specific image-classification problem.

Driverless cars require robust data capacity for image processing and recognition. Navigation and mapping data is also essential, with GPS coordinates used extensively. Mercedes, BMW, and Audi purchased the mapping business Here from Nokia for US$2 billion; Here combines "static" mapping data taken from cars with 3D cameras with live information supplied by a network of connected devices, including cars (Bell 2015). In January 2016, Volkswagen partnered with Mobileye, a technology company that develops vision-based advanced driver-assistance systems, to produce its real-time image-processing cameras and mapping service for driverless cars. Ford became the first manufacturer to road test a fully autonomous car in snow on public roads in March 2016 after working with researchers from the University of Michigan to create an algorithm recognizing snow and rain (Ford 2016). Ford has already tested autonomous Fusion cars on public roads in the U.S. states of Arizona, California, and Michigan. 

Despite these groundbreaking developments, the move toward autonomous driving is not without its problems. Many worry that a car-centric vision detracts from more sustainable solutions related to public transportation and urban design (covered by Sustainable Development Goal 11). Driverless vehicles are also likely to wipe out millions of jobs, including taxi drivers, couriers, and truck drivers, something new policies must address urgently. Moreover, legal frameworks will need to keep pace and be redesigned. Although a few countries are moving to issue new legal frameworks for autonomous driving, significant legal gaps remain.

Text mining and text analysis

Also known as text mining, text analytics is the science of turning unstructured text into structured data. Text analytics is focused on extracting key pieces of information from conversations. By understanding the language, the context, and how language is used in everyday conversations, text analytics uncovers the "who" of the conversation, the "what" or the "buzz" of the conversation, "how" people are feeling, and "why" the conversation is happening. Conversations are categorized and discussion topics identified.

The technology is being leveraged, among other things, to support agricultural development and build food security ­– covered by SDG 2. Kudu, a mobile auction market application, is using text analysis algorithms to match farmers looking to sell their produce with suitable market traders. The system allows any farmer or trader to send a message by phone. Once matched, compatible buyers and sellers are notified. Kudu not only limits unnecessary travel and dependency on intermediaries, but encourages competition by overcoming critical information gaps. The application was developed by the AI Research Group, which is specialized in the application of AI to problems in the developing world and operates out of the College of Computing and Information Sciences at Makerere University in Kampala, Uganda.

Analysis of text from Twitter feeds has also been used to track food prices in real time in Indonesia. UN Global Pulse worked with the Ministry of National Development Planning and the World Food Programme to "nowcast" food prices based on Twitter data. The outcome was a statistical model of daily price indicators for four commodities: beef, chicken, onion, and chili. When the modeled prices were compared with official food prices, the forecast and actual prices were closely correlated, demonstrating that near real-time social media signals can serve as a proxy for daily food prices.

Similar techniques are being used to analyze a host of other development issues. For example, the ability to monitor public sentiment toward policy measures in real time, via social media, can provide critical information on the impact of policy and how it is playing out in practice, especially for vulnerable groups or households (box 3.8). Data from social media can also help estimate the number of expats around the world (box 3.9).

As mentioned earlier, conducting household surveys is often expensive. New approaches such as monitoring social media could help address data gaps in developing economies. Moreover, these approaches may capture marginalized or migrating communities not always accounted for by traditional means such as national censuses.

Box 3.8 Monitoring public sentiment about policy reforms using social media in El Salvador

In April 2011, the government of El Salvador removed a countrywide subsidy on liquid petroleum gas, the most common domestic cooking fuel. Instead of subsidizing prices at point of sale, eligible households were given an income transfer. The reform triggered considerable public debate and controversy.

UN Global Pulse and the World Bank teamed up to investigate whether social media signals from Twitter could be used to understand public perceptions and social dynamics surrounding the fuel subsidy reform, specifically reactions and concerns about political partisanship, the level of information reaching communities about the reform, and trust in government commitment to deliver the subsidy. A taxonomy of keywords was developed to filter Twitter for relevant content. Regional experts were consulted to ensure slang words and synonyms were included in the taxonomy. Tweets were then filtered to assess relevance and isolate content originating from El Salvador.

The study suggests that social media analysis, using big data and AI, can help inform policy implementation, as the sentiment observed was similar to public opinion measured by household surveys.

Box 3.9 Shedding light on migration patterns using social media information

Data from social media can be used to help estimate migrant populations. For example, studies based on Facebook data yield estimates of approximately 214 million "expats" in the world (people stating that they live in a country other than their self-reported "home country"), close to the 2017 estimated total of 258 million international migrants globally.

Among the issues surrounding the use of social media data to estimate migrant populations are the difficulty in defining who an international migrant is, selection bias, and the reliability of self-reported information. But scholars are working on reducing selection bias via model fitting and results are promising.