Developing Insights from Social Media

While formal market research has historically represented consumer-centric information, social listening is now considered the most insightful. Review the model, Figure 1, the schematics of social listening, and its application to the social media platform Twitter. List the advantages and disadvantages of social listening for strategic insight, paying particular attention to the error factors discovered from this study.

Twitter is a social media sharing site considered a snapshot of consumer and industry sentiment. Review Figure 1 in this article, then identify how the collective opinion fits into the signal of opportunity. Pay attention to the secondary retweets that measure the interest level of those who have chosen to follow the tweet trails.


Rich multidisciplinary literature has shown that Twitter data can be adapted to develop useful indicators for social trends. Few studies, however, propose a unified scheme that provides researchers with detailed and practical guidance for discovering social insights. The goal of this paper is to fill the gap. This paper suggests a comprehensive perspective to utilize Twitter data for text-based, customized user profiling, which can serve as an alternative to the existing user profiling methods, and to develop effective social trends from the collective voice of target users.

With the new opportunities brought by the emergence of Big Data into traditional survey research, social media have been considered a good source for public opinion research and social trend analysis. Popular social media services such as Twitter and Facebook are known for their open nature that allows people to freely share their opinions, attitudes, and behaviors.

One of the remarkable features of Big Data created from social media is that they provide "organic data", as opposed to "designed data", as stated by Groves. Traditional surveys analyze designed data, or "made data", which were initially made through the intervention of researchers and thus carefully designed to help answer the research question. In contrast, organic data, also known as "found data", are not originally made to answer research questions. They were intended for another primary use and just found by researchers regardless of the original intention of the data. Social media data are a good example of this; most of the social media services we use everyday were never designed for research. Simply because they were not originally made for research, there is no guarantee that the found data can help to answer a research question.

Due to this naturally occurring nature of social media data, the question of what research areas could benefit from the organic data has been extensively addressed in many different sectors such as academia, industry, and governments in the last decade, one of which is social trend analysis. As with any type of Big Data, social media data tend to become more significant when aggregated in a large scale, and the collective voice from social media can serve as powerful indicators that signal social trends in a market or a society. What many people say on social media can be considered their interests, which can translate into a certain social trend.

A traditional survey begins by establishing study objectives, defining a target population of interest, and then selecting a sampling frame, or a survey population to interview. This sample is expected to represent the entire target population substantially, if not completely. These initial steps can equally apply to social trend analysis leveraging social media. The selection of users from social media depends on who should be targeted at in the study. For example, suppose that a market research project aims to discover new social trends among young women who are interested in fashion. To that end, a team of researchers opt to look at Twitter and collect a large amount of Twitter data to create a pool of random Twitter users and tweets. To select the right users for this study from the pool, it is essential that they need to know the age, gender, and interests of each user in the pool, so that they can identify young female users interested in fashion. This process is called user profiling, or user modeling. User profiling aims to identify a set of attributes of users that are essential to the study, such as demographic attributes (e.g., age and gender) and any other personal attributes that are helpful to know for the study (e.g., interests and personal traits). The more we know about users, the more effective user targeting will become. It is only when we can identify the right users on social media that we are able to discover social trends from the target users. In other words, detecting social trends would not make sense if we fail to identify the target users who are believed to represent the target population for the study. Previous literature has been focused on this user profiling task from many different perspectives, which will be presented in detail in the next section.

Choosing the right social media platform is another essential aspect of user targeting, as it determines the pool of candidate users. Of the many existing social network platforms that can be characterized in different ways as listed by Musial et al., Twitter has been gaining the most attention from researchers primarily due to its topological characteristics in the form of follower-followee relationship and also its power as a new medium of information sharing. Its open nature allows people to talk about anything and everything on Twitter, except for some unusual cases when it does harm to the public. This open nature offers researchers unprecedented opportunities to have a better understanding of people from what they share online with the world. In addition, Twitter opens part of its user-created data to the public in the form of Application Programming Interface (API), called Twitter API. For example, Twitter Streaming API, which allows users to retrieve real-time tweets from Twitter, is known to provide up to 1% sample of all the tweets created on Twitter at a given time. While this 1% sample may appear to be too small to be used in a study, it could be sufficient in many cases, considering the enormous size of the entire data. On the other hand, it is known that the random samples from Twitter could have a potential bias.

Hsieh YP, Murphy presents a novel error framework for Twitter opinion research called Total Twitter Error, which is a variation on the traditional Total Survey Error that was originally designed to conceptualize the procedural and statistical errors of survey estimates. Specifically, the Total Twitter Error framework comprises three broad error sources: coverage error (over- and under-coverage of Twitter users and tweets), query error (inaccurate search queries leading to failure to extract proper data for analysis), and interpretation error (discrepancy between the true value or meaning and the one inferred from the interpretation). These three types of errors will be mentioned wherever possible and necessary in this paper.

There has been a wide range of research that attempts to identify social trends represented on social media, and each study has its own ways to collect and process data to detect trends. Few studies, however, provide a generic procedure that guides researchers who want to leverage social media data, more specifically Twitter data, for social trend analysis. This study has two main objectives: (1) to effectively identify the target audience of users in Twitter data by user profiling and (2) to develop topical and social insights from the collective voice of the target users. For the user profiling task, specifically, we present text-based customized user profiling, which can be considered to be an alternative when there are no existing user profiling solutions that are available or work for the user attribute or the data of interest. We believe that this study is novel in that it presents a pragmatic scheme for Twitter user profiling and social trend discovery with a comprehensive and detailed guidance on how to use raw Twitter data to identify the target audience for a study and mine social trends from what the target users say on Twitter.

Two case studies support that our approach facilitates discovery of social trends among a group of people on Twitter in a particular domain. The first case study identifies a target audience of young female users who are interested in fashion and successfully discovers the popular topics and influential actors among them, which are believed to provide insights into marketing strategies. For user profiling, we apply heuristics for the interest attribute of users as well as some of the available user profiling solutions that proved to perform well for the account type, gender, and age attributes. The second case study demonstrates that political orientation, i.e., conservative vs. liberal, does affect the reactions to the Me Too movement. Leveraging customized user profiling to identify the political orientation of each user, we develop our own high-performing political orientation classifier from the Random Forest algorithm, which is fitted to our Twitter data.

There have been recent research papers whose application of sentiment analysis has been extended to many practical fields from medicine to economics. For instance, Roccetti M, Marfia G, Salomoni P, Prandi C, Zagari R, Kengni FG show how data posted on Facebook by Crohn's disease patients are can be used to understand the patient's perspective on a given medical prescription. Shapiro AH, Sudhof M, Wilson D. show that an economic sentiment derived from economic and financial newspaper articles is predictive of movements of survey-based measures of consumer sentiment. Similarly, Seki K, Ikuta Y, Matsubayashi Y.  use a self-attention-based model to measure business sentiment based on textual data from daily newspaper articles. They show that the proposed index is strongly correlated with established survey-based index and a variety of economic indices. Even though the current study primarily focuses on Twitter data, the proposed text-based approach has a potential that can be extended to other text data analysis in order to develop sentiment indexes for many disciplines.

The rest of this paper is organized as follows. "Related literature" section outlines related literature on user profiling. "Discovering social trends in a target audience" section describes the steps for Twitter user profiling and social trend discovery. "Case studies" section discusses two in-depth case studies: one on women's fashion market research and the other on the Me Too movement reaction. "Discussion and conclusions" section concludes and offers some directions for future research.