DDDM not only benefits businesses but also enables governments to make better policy decisions. For instance, DDDM can be utilized to uncover hidden patterns, unexpected relationships, and market trends or reveal preferences that may have been difficult to discover previously. Armed with this information, government entities can make better decisions about healthcare, infrastructure, and finances than they could before. Read this article from the Executive Summary through Chapter 2 to explore data-driven decision models, how data is changing development, and how data can fill the holes in policymaking.
Data: The Fuel of the Future
A Data Typology
Data-driven development is an emerging and rapidly changing field. So it may be useful, at the outset, to define terms recurring across the chapters. These are not fixed or official definitions, but rather working usage for this report:
- Big data, a commonly used term, describes data sets so large or so complex that traditional data processing techniques are inadequate. The field of big data analytics uses advanced computational techniques to extract meaningful information (such as patterns, trends, repetition) from data. For the moment, big data is largely the domain of large private companies. But as tools to mine it become cheaper and more readily available, smaller companies and governments will also use big data. It can be useful to further distinguish between big data produced intentionally or unintentionally and that produced by humans and by machines, as in table 1.2.
-
Data
generationIntentional Unintentional Data
agentHuman Primary content Data exhaust Machine Secondary content Internet of Things data - Data exhaust, which is unintentionally created by humans. This can include metadata (data about data), such as call-data records derived from mobile phones, or the trail of data left by users engaged in other activities, such as keystrokes. Data exhaust generally has low value, but the trail left by millions of users can be mined or combined to extract value or to hack into an otherwise secure system.
- Internet of Things data, which is intentionally created, but from sensors and other internet-connected devices, rather than from humans. This mainly has value in the aggregate – and over time – but can also be used to provide alerts for impending events, such as extreme weather conditions.
- Primary content is intentionally created by humans, typically users. An example here might be a social media profile or a browser search history. When thousands or more of these are combined and anonymized they can be used, for instance, for analyzing popular or emerging trends. Humans also create primary content in the form of videos, academic papers, blogs, and the like that can be mined, for instance, for sentiment analysis.
- Secondary content is intentionally created, but through
artificial intelligence rather than directly by humans.
A benign example would be a chatbot that helps a user
fill in a form online by giving suggestions. A malign
example would be a fake social media profile that
seeks to influence buying habits or political opinion.
- Personal data relates to an individual and is generally concerned with private information. Personal data can form large, complex data sets (such as multiple health indicators including weight, blood pressure, or heart rate measured over a lifetime) but more normally constitute small data, which can be easily monitored. Personal data may be willingly exchanged in return for convenience (such as a phone number or email address), but it can also be given away unwittingly (such as date of birth provided to enter an online competition) or unwillingly (such as data hacked from a personal email account). The consequences of loss of personal data, explored more in chapter 4, might include loss of privacy and loss of control (over the future use of personal data) and a loss of agency (such as being exposed to a more limited range of news sources or opinions as a result of previously expressed preferences). What is relatively new is how persistence, repurposing, and spillovers from big data increase the risk and uncertainty about how private data can be used in the future.
- Open data refers to data made freely available and deliberately stored in an easily read data format, particularly by other computers, and thereby repurposed. For instance, data on airline schedules could be easily read by travel companies to generate customized itineraries for travel websites. Governments may use open data to promote transparency and accountability in their operations and allow voters to measure the performance of different government functions. As a philosophy, therefore, open data is intended to encourage the juxtaposition of data from different sources to create value and new applications. It is estimated, for instance, that some 500 different applications use London transport data, and the savings to the UK economy from its open data policy amount to some £6.8 billion (about US$9.5 billion) a year. Open data tools are particularly useful in the transport sector (see box 1.1).
Box 1.1 Open data tools for improving transport through big data
Rise in digital data. Digital data has proliferated with the rapid increase in smartphone ownership in advanced and emerging market economies, alongside advances in global positioning systems and digital sensors. This data has the potential to transform transport systems worldwide. The location-tracking data provided by smartphones, for instance, can reveal how and why people travel, information critical for optimizing transport networks. Accordingly, opensource tools and cloud-based platforms have been developed to help collect, manage, and analyze the ever-increasing volume of digital data. These easily accessible tools provide individuals, governments, and private entities with sophisticated analysis capabilities, empowering them to improve all aspects of transport. Such tools will be particularly beneficial in developing countries that have limited resources.Open-source tools. The World Bank has developed a variety of free-of-charge tools that capitalize on big data to facilitate transport-related development projects across the globe (see Figure B1.1.1). These tools provide numerous capabilities, including transit system analysis, route planning, and road condition and incident reporting. Open Transit Indicators allows public transit administrators to assess existing services and identify improvements through the collection and analysis of standardized transit data. This approach has been used to address transit problems in China, Kenya, Mexico, and Vietnam, among others.b The Rural Accessibility Platform uses freely available OpenStreetMap data to evaluate the accessibility of rural population centers to points of interest.c Indices of rural accessibility have been used to identify needed transportation improvements in countries including Bangladesh, the Lao People's Democratic Republic, and Zambia.d These open-source data and tools make transport analysis accessible for a broad range of users.
Citizen engagement. The increasing ubiquity of smartphones and internet connectivity is allowing individuals to provide valuable data and contribute to development efforts. Citizen engagement is prioritized in many of the World Bank's transport-focused open-source tools. For example, the smartphone application RoadLab uses a crowdsourcing approach to obtain route information and roadway infrastructure conditions from users.
Figure B1.1.1 How open data tools can assist transport
The related tool RoadLab Pro was used to assess the conditions of unclassified road networks in Mozambique, demonstrating the potential of citizen provided smartphone data in transport planning. These tools provide an easy-to-implement way for traffic engineers to obtain roadway information, particularly when professional pavement testing equipment and base geographic information system maps are not available. Similarly, DRIVER capitalizes on crowdsourced data to collect road incident information, which can then be visualized and analyzed to improve enforcement and resource allocation. In the Philippines, DRIVER has been applied to identify and prioritize problematic road areas for interventions.
- Metadata, or "data about data," is used to classify, categorize, and retrieve data files. For instance, metadata might include the date on which it was created, the number of pages or data size, and keywords that can be used to search. Data attributes may be added to data according to the way it is typically used, for instance, how popular it is as a function of how frequently it is downloaded. Metadata helps with data analysis and can be applied to data users, such as by giving them attributes, sometimes based on inferred data, that equate to a "reputation".
- Data platforms offer a convenient and cost-effective way to link customers and suppliers. Some platforms may connect only peers (such as a dating website) and others might be internal to an organization (such as an intranet). But most of the popular
platforms using the internet are multisided platforms. Uber, for instance, connects drivers with riders; AirBnB connects property owners with guests; and Jumia connects sellers with potential buyers. But the biggest platforms are those that connect
advertisers with consumers, usually in return for some kind of free service, such as social media or web browsing. As explored in chapter 5, multisided platforms, driven by advertising, are now among the most powerful firms in the world.