Prediction and Inference in Data Science

This article is a bit heavy on jargon for data scientists. Still, it makes the interesting case that what we often call prediction is only making inferences, identifying trends in data, and interpreting them, not using them effectively to predict what is likely to happen next. The article also makes the point that prediction may not be the endpoint of machine learning but that providing prescriptions on what to do about likely future outcomes will become the standard soon. Be sure to read carefully through the box office, marketing, and industry trend examples to see how to apply the concepts in the article.

Abstract

The strategic role of data science teams in industry is fundamentally to help businesses to make smarter decisions. This includes decisions on minuscule scales, such as what fraction of a cent to bid on an ad placement displayed in a web browser, whose importance is only manifest when scaled by orders of magnitude through machine automation. But it also extends to singular, monumental decisions made by businesses, such as how to position a new entrant within a competitive market. In both regimes, the potential impact of data science is only realized when both humans and machine actors are learning from data and when data scientists communicate effectively to decision makers throughout the business. I examine this dynamic through the instructive lens of the duality between inference and prediction. I define these concepts, which have varied use across many fields, in practical terms for the industrial data scientist. Through a series of descriptions, illustrations, contrasting concepts, and examples from the entertainment industry (box office prediction and advertising attribution), I offer perspectives on how the concepts of inference and prediction manifest in the business setting. From a balanced perspective, prediction and inference are integral components of the process by which models are compared to data. However, through a textual analysis of research abstracts from the literature, I demonstrate that an imbalanced, prediction-oriented perspective prevails in industry and has likewise become increasingly dominant among quantitative academic disciplines. I argue that, despite these trends, data scientists in industry must not overlook the valuable, generalizable insights that can be extracted through statistical inference. I conclude by exploring the implications of this strategic choice for how data science teams are integrated in businesses.


Source: Nathan Sanders, https://hdsr.mitpress.mit.edu/pub/a7gxkn0a/release/6
Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License.