Developing Insights from Social Media

While formal market research has historically represented consumer-centric information, social listening is now considered the most insightful. Review the model, Figure 1, the schematics of social listening, and its application to the social media platform Twitter. List the advantages and disadvantages of social listening for strategic insight, paying particular attention to the error factors discovered from this study.

Twitter is a social media sharing site considered a snapshot of consumer and industry sentiment. Review Figure 1 in this article, then identify how the collective opinion fits into the signal of opportunity. Pay attention to the secondary retweets that measure the interest level of those who have chosen to follow the tweet trails.

Related literature

User profiling has been known as an effective way to gain a better understanding of users in a platform, and the enhanced understanding of users can facilitate many different applications such as target marketing and personalized recommendation. It is worth noting that the majority of studies on user profiling chooses Twitter among many other social media platforms, primarily due to its open and data-friendly nature, which was previously discussed in "Introduction" section.

User profiling focuses on what attributes of users need to be identified. User attributes can be categorized into two broad categories: demographic attributes and other personal attributes. Demographic attributes of users have been extensively addressed as the primary information about users, due to the fact that they tell much about someone. Demographic attributes include age, education, gender, location, marital status or spouse, language, and race or ethnicity. There are other personal attributes including account type (personal vs. organizational or human vs. bot), expertise, hobbies, interests, personal traits, political orientation, and influence. Table 1 lists the user attributes that can be inferred from Twitter data and the proposed methodologies for each user attribute. Note that the list of methodologies in the table is not exhaustive due to the vast amount of literature.

Table 1 Summary of the derivable user attributes, necessary data, and existing methodologies

Type

User attribute

Data (Methodology)

Demographic attributes

Age

Tweet text, Tweet text, follow Profile image, the name, screen name, and description fields in User object

Education

Tweet text, follow

Gender

The name field in User object Profile image, the name, screen name, and description fields in User object  Tweet text  Tweet text, follow

Location

The location field in User object Tweet text  Tweet text, follow Tweet text, tweet context

Marital status /spouse

Tweet text, follow

Language variety

Tweet text

Occupation

Tweet text Tweet text, follow

Race/ethnicity

The name field in User object Tweet text, User object field, follow

Other personal attributes

Account type

Tweet text Tweet text, follow  Tweet text, User object fields User object fields, tweet context  Profile image, the name, screen name, and description fields in User object

Expertise

Tweet text, the description field in User object, user lists User lists

Hobbies

Tweet text, follow

Interests

Tweet text Tweet text, followPosted URLs  User lists

Personality traits - Big Five

Tweet text User object fields

Personality traits - Dark Triad

Tweet text and User object fields

Personality traits - MBTI

Tweet text

Political orientation

Tweet text  Tweet text, User object fields, follow

Influence

Follow  Follow, tweet text Tweet text


Regarding the age attribute, since it is challenging to identify the exact age of a user, previous work has been focused on identifying predefined age ranges, e.g., below 30 vs. above 30 or 10s or younger vs. 20s vs. 30s vs. over 40s. Rao et al. consider only the tweet text of users for age identification, whereas et al. consider both follow relationship of users and tweet text. More recently, Wang Z utilize the profile image and the name, screen name, and description fields in a User object to identify the age as well as the gender and account type with a single multi-modal model. This technique will be used in our first case study in "Case studies" section.

Identifying education level and spouse of users has not been extensively addressed mainly due to the lack of available training data, as stated by Li J, Ritter A, Hovy E.. The study employs a technique called distant supervision which learns to extract relations from text using ground truth from an existing database such as Freebase, to detect school and spouse entities mentioned in tweet text.

For gender classification, most of the studies consider tweet text, based on the idea that user's gender with only two classes, female and male, can be distinguished from what they say and the way they say on Twitter. Mislove et al. simply consider the description field of a User object, while et al. consider follow relationship as well as tweet text.

User location is one of the attributes that have been investigated the most extensively for many different purposes. Here, locations refer to users' home locations indicating their residences, tweet locations as their current locations at the time of tweet posting, and mentioned locations reflecting their places of interest. Zheng et al. provide a comprehensive survey of the existing approaches to location prediction on Twitter. Most of the studies are motivated by the fact that only a small portion of tweets are geo-tagged or geo-referenced, which means that few tweets contain exact geo-information to be used for accurate location identification. Refs. (Cheng Z, Rao D, Kanta M, Han B, Ajao O, Li P ,Singh J ) only consider tweet text for location prediction, whereas (Li R) and Ikeda K) add follow relationship and (Ahmed A ,Yuan Q) add the tweet context as additional features of their models. (Mislove A) simply use the location field of a User object.

Marital status, i.e., whether a user is single or married, is another demographic attribute that tells much about an individual and their family. Both (Ikeda K, Oentaryo RJ) consider tweet text and follow relationship for marital status identification.

For language variety identification, which can also be related to race or ethnicity of a user, (Basile A) identify for four languages, English, Spanish, Arabic, and Portuguese, while (López-Monroy AP,) distinguish two languages, English and Spanish, both considering tweet text.

Identification of occupation is motivated by the fact that a person's life is deeply connected with and explained by their occupation. Hu et al. consider eight job categories such as Marketing, Administrator, Start-up, Editor, Software Engineer, Public Relation, Office Clerk, and Designer. Ikeda et al. consider seven job categories including Employee, Part-time, Self Employed, Civil Servant, Homemaker, Student, and Without occupation, while (Li J) identify specific job entities in tweets. Hu et al. use tweet text, whereas (Ikeda K, Li J) use follow relationship as well as tweet text.

Race or ethnicity has not been addressed as much as other demographic attributes. Mislove et al. consider the name field of a User object to extract the last names of users and compare them with the U.S. 2000 Census data. Pennacchiotti et al. consider tweet text, some fields in User object, and follow relationship of users to identify whether a user is either African-American or not.

In addition to demographic attributes, there are other personal attributes that can be identified by user profiling. Account type identification is interesting in that it aims to tell whether a user account on Twitter is either a personal account or not, in other words, an organizational account or a bot account. Fagni et al. consider tweet text to first identify whether an account is either human or bot, and, in case of human, further identify the gender (female vs. male). Oentaryo et al. use tweets and follow relationship to identify whether an account is either personal or organizational. McCorriston et al. address the same problem using some fields in User object. Alzahrani et al. focus on detecting only organizational accounts using some fields in User object and tweet context.

Expertise is another interesting attribute in that it can be used for applications such as personalized recommendation, expertise matching, and community detection. Refs. (Wagner C, Niu W) both use user lists which are curated groups of Twitter accounts created and managed by users, while the former additionally use tweet text and the description field of a User object to extract expertise.

Regarding hobbies, is the only study we have found, which identifies the hobbies of Twitter users from the twelve hobby categories such as Reading, Gourmet, Vehicle, IT & Electronics, Games, Pets & Plants, Sports, Travel, Fashion, Music, TV & Movie, and Arts, by considering tweet text and follow relationship.

Interests are among the most extensively investigated user attributes along with the location attribute, as users' interests can be directly used for applications such as personalization and customized marketing. The literature ranges from the studies considering only tweet text to those considering follow relationship as well as tweet text, the one considering only the posted URL in tweets, and the one considering user lists.

Identification of Personality traits attempts to classify users' personality into one of the well-known personality trait categories such as Big Five (Openness, Conscientiousness, Extroversion, Ageeableness, and Neuroticism), Dark Triad (Narcissism, Machiavellianism, and Psychopathy), Myers-Briggs Type Indicators, or MBTI. The Big Five model has been adopted by most of the studies such as (Golbeck J, Qiu L, Gou , Chen J, Liu F, Quercia D), while there are a study focusing on the anti-social traits called Dark Triad and studies adopting MBTI.

Identification of political orientation, affiliation, or preference has been addressed as a binary classification problem with only two classes: Republican/conservative/right vs. Democratic/liberal/left. Refs. (Rao D, Volkova S) consider only tweet text, whereas (Pennacchiotti M) consider some fields in User object, follow relationship, and tweet text.

Last, user influence refers to the influence of a user on other users in a social network. This measure can be leveraged to identify influencers or opinion leaders in a domain. Here, measuring how influential someone is can be very subjective, which has lead researchers in many different disciplines such as social science and economics to propose a variety of approaches to measuring user influence. Refs. (Riquelme F, Tabassum S) provide great overviews of the existing influence measures for Twitter users in literature. Refs. (Hajian B, Weng J, Jin X) only rely on the follow relationship to apply traditional centrality measures such as closeness, betweenness, and PageRank to Twitter users, whereas (Cha M, Aleahmad A) add tweet text as an additional source to consider and Pal A, Counts S. utilize tweet text alone to measure user influence.