The Language of Brands in Social Media

Review of Prior Research on Text Mining and Language Models

Research in computer science has long focused on text-mining approaches involving extracting meaningful information from unstructured data. Methods have ranged from closed-vocabulary approaches, such as Linguistic Inquiry and Word Count, to open-vocabulary approaches.

The increased availability of large quantities of user-generated data from digital and online channels has spurred considerable interest in applying text-mining methods to generate insights relevant to marketing. During the last few years, the research stream using text mining has evolved from uncovering data on product attributes to predicting stock market performance. For example, Tirunillai and Tellis investigate whether UGC can predict stock market performance, using aggregate content derived from multiple websites. They find that the volume of chatter is a good predictor of abnormal returns, with an asymmetric influence on negative (vs. positive) UGC. Rust et al. propose that social media data can be used for reputation tracking.

Subsequently, research in marketing evolved to focus more on extracting meaning from co-occurrences of words or social tags, in an effort to infer market structure and gain insights into brand positioning. Table 1 provides an overview of the research in marketing that uses text mining of social media and highlights key findings. For example, Netzer et al. use data from an online discussion forum and subject it to text-mining algorithms to build associative interbrand networks. They show how firms can employ this method to infer market structure. Tirunillai and Tellis evaluate product reviews from 15 firms over a four-year period to generate key inferences about consumer satisfaction and quality and its underlying dimensions. Nam and Kannan use user-generated social tags as proxies for diverse concepts and associations connected with a given brand and help predict unanticipated stock returns. Culotta and Cutler infer critical brand attributes relevant to a brand's positioning by examining a brand's social connections (i.e., cofollowers of the brand's followers on Twitter).

Table 1. Studies Using Text and Visual Mining of UGC to Infer Brand Perceptions and Positioning.

Authors Key Research Questions Types of Social Media Data Empirical Context and Sample Methodology and Algorithm Findings and Contributions
Text:
Reviews
Text:
Social Media
Posts/Conversations
Word
Co-occurrences
Images Social
Tags
Lee and Bradlow
Deriving market structure by eliciting product attributes and consumers' preferences for attributes         Reviews for digital cameras available between 2004 and 2008 Clustering of phrases discussing a common attribute, followed by decomposition of attributes into its constituent dimensions. A system for automatically processing text such as customer reviews in order to identify salient attributes and how brands compare with each other.
Archak, Ghose, and Ipeirotis
Using text analysis of reviews to infer product attributes and examine their impact on sales.         Amazon reviews of digital cameras and camcorders over a 15-month period Attribute extract done by semisupervised extraction technique and an automated extraction technique. Customer opinions relating to attributes parsed using syntactic dependency parser. Text mining of reviews can provide information about which product features described are more important and how opinions in reviews can be quantitatively determined to sales. Textual content is significantly predictive of sales.
Netzer et al.
Using co-occurrence of words and semantic network analysis to infer market structure         Using an online form to extract product mentions in sedan cars and diabetes drug categories Co-occurrence of terms within a message used to construct a semantic network which forms the basis for market structure. Semantic network analysis using text mining of user-generated data can be used to create maps that show how brands are similar or dissimilar.
Tirunillai and Tellis
Extracting key latent dimensions of customer satisfaction         UGC of product reviews across 15 firms in 5 markets LDA-based framework; unsupervised Bayesian learning algorithm Key latent dimensions of customer satisfaction vary across markets. Presented a unified framework to extract latent dimensions from UGC.
Nam and Kannan
Examining how associative structure of concepts and attributes linked to a brand can affect firm sales and stock market valuation.         Social tagging data for 44 firms across 14 markets Bipartite networks of firms/brands and social tags Social tags provide insights into associative networks of perceptions, which can reveal competitive market structure and help measure customer-based brand equity. These, in turn, affect stock returns.
Büschken and Allenby Introducing a new approach to text analysis based on sentence rather than word structure.         Reviews of Italian restaurants (N = 696); reviews of upscale (N = 3,212) and midscale (N = 1,255) hotels LDA, along with sentence-constrained LDA (SC-LDA), and SC-LDA with sticky topics Identifying latent topics in customer reviews using sentence structure contained in the reviews can contribute to better performance relative to existing models (based on analysis of individual words).
Liu, Dzyabura, and Mizik
Extracting dimensions of brand image using visuals (images) in social media         114,367 photos of brands of 56 brands Developed image classifiers to predict whether a particular attribute is expressed in an image, then used machine learning (SVM and ConvNets) to relate image attributes to perceptual features. Findings indicate key differences between how consumers and brands portray brands on visual social media.
Toubia et al. Using new guided LDA approach to extracting themes for describing entertainment products.         Movie descriptions of 429 movies. Guided LDA approach automatically extracts a set of features of entertainment products based on descriptions. Guided LDA approach helps predict movie-watching behavior at the individual level.
Rust et al.
Using a new brand reputation tracker of Twitter comments.         Twitter data of top 100 brands Created a reputation tracker using social media data and linked it to financial performance Highlights a new social media–based brand reputation tracker by mining Twitter comments for the world's top 100 brands using Rust et al.'s value–brand–relationship framework, on a weekly, monthly, and quarterly basis.
Current research Testing how text analysis of social media conversations can help identify latent topics; topics' interrelationships with brands can be used to assess a brand's positioning and shifts in positioning over time         Random sample of tweets (N = 134,953) across 196 brands. Sampling was done on a larger corpus of 1.2 million tweets. Differential language analysis method included (1) LDA in which linguistic features (e.g., topics) automatically determined from text; (2) discriminating: identifying which topics are different across brands by language differences across brands; and (3) mapping brands on similarities. There are significant similarities and differences across brands in terms of their image based on topics derived from open-vocabulary analysis of Twitter data. Validation of the proposed approach highlights meaningful differences between our approach and standard methods used to assess brand image. This enables mapping of brand positioning.

Furthermore, text-mining research has examined novel strategies derived from topic models and addressed important substantive questions. Büschken and Allenby propose a model in which topics are constrained not to change within a sentence and show that this restriction led to more interpretable topic word probabilities in customer review data. Borah and Tellis (2016) examine whether online chatter about one brand in social media spill over to affect other brands, particularly when the brands share similar associations (e.g., country of origin). Cho, Fu, and Wu use topic modeling to model patterns of topics in the research literature and link them to coauthorship communities. Hu et al. examine how to infer a brand's personality from social media data on the basis of users' imagery, official announcements, and the internal organizational culture of the brand's workplace.

We apply these approaches of mining unstructured data by employing open vocabulary to derive topics (i.e., groups of similar words) directly from social media messages. Our framework proposes ways of linking brands and topics and identifies applications of the proposed methodology to address key issues facing marketers. We introduce the concept of TTV to offer guidance to brand managers on how to predict future shifts in brand preferences and usage. These applications highlight novel use cases of language analysis for brand management decision making, and we use case studies from our data analysis to derive findings and insights to showcase these applications. In the next section, we provide a detailed overview of the approach and then outline the various applications.