While formal market research has historically represented consumer-centric information, social listening is now considered the most insightful. Review the model, Figure 1, the schematics of social listening, and its application to the social media platform Twitter. List the advantages and disadvantages of social listening for strategic insight, paying particular attention to the error factors discovered from this study.
Twitter is a social media sharing site considered a snapshot of consumer and industry sentiment. Review Figure 1 in this article, then identify how the collective opinion fits into the signal of opportunity. Pay attention to the secondary retweets that measure the interest level of those who have chosen to follow the tweet trails.
Case studies
Having established a procedure to identify a target audience in Twitter and discover social trends from their collective voice, we now move on to two in-depth case studies that demonstrate how a research study can benefit from our approach. The first case study will provide details on the market research project example we have mentioned throughout this paper, while the second case study performs a comparative analysis on the effects of political orientation on a gender issue.
Table 3 Monthly data statistics of the pools of random Twitter users and tweets
Month |
User count |
Tweet count |
---|---|---|
12/2021 |
22,569,110 |
133,387,546 |
11/2021 |
21,876,935 |
129,462,997 |
10/2021 |
22,175,272 |
133,334,050 |
09/2021 |
21,446,941 |
127,009,377 |
08/2021 |
21,708,191 |
133,447,209 |
07/2021 |
21,979,242 |
133,358,039 |
06/2021 |
21,611,226 |
128,414,906 |
05/2021 |
22,651,068 |
133,741,215 |
04/2021 |
22,138,958 |
129,235,713 |
03/2021 |
22,441,309 |
133,544,952 |
02/2021 |
21,529,017 |
120,703,913 |
01/2021 |
22,317,570 |
133,754,300 |
12/2020 |
21,107,115 |
120,627,976 |
11/2020 |
21,950,691 |
129,635,445 |
10/2020 |
21,889,317 |
133,221,211 |
09/2020 |
22,344,474 |
128,867,950 |
08/2020 |
22,643,060 |
133,302,754 |
07/2020 |
22,930,209 |
133,609,303 |
06/2020 |
22,419,694 |
128,885,150 |
05/2020 |
23,554,291 |
133,389,857 |
04/2020 |
23,420,878 |
129,330,138 |
03/2020 |
23,400,803 |
133,474,640 |
02/2020 |
21,260,800 |
125,029,995 |
01/2020 |
22,275,681 |
133,666,464 |
Total (Unique) |
138,845,242 |
3,132,435,100 |
Young women's fashion market research
Marketers want to know what their customers are currently interested in and who are the influencers among them, so that they can have insights into new business opportunities and focus their marketing effort and resources on specific people who could influence others. In this in-depth case study, we aim to first identify young, female users in Twitter who are interested in fashion and then discover popular topics and influential users among them.In order to find the target audience of female users interested in fashion, we first begin by searching our random pool of tweets for tweets that have the hashtags #fashion and #style. As mentioned earlier, each Tweet object has a User object that indicates the user who created the tweet, which allows us to identify all users in our pool who have ever used the two hashtags. Here, mentioning the hashtags is assumed to be their interest in the topic. This step can be understood as a simplified implementation of the interests attribute in Table 1. Note that, one can consider adding more hashtags as search terms that are similar to #fashion and #style such as #beauty and #clothing. The search allows us to find 111,913 users in total. Using the Twitter API, we further check if each of these users still has a valid, public account, which leaves 89,437 users.Footnote 10 Next, we remove users whose total number of tweets posted is fewer than 100, based on the idea that we would need at least 100 tweets to understand a user by their tweets. This results in 51,276 users in total, i.e., |U|=51276. We then collect up to 3200 most recent tweets from each user using the Twitter API, which totals 107,002,581 tweets, i.e., |T|=107002581.
After finding users interested in fashion and collecting their recent tweets, the next step is to identify each user's gender and age, which will allow us to select young female users. Before applying a gender classification solution, we first remove organization accounts, based on the belief that organizations do not represent our target customers. Note that researchers may want to include organization accounts if they believe organizations are worth being considered in their study. In this case study, we are only interested in individuals, especially young female users. This step can be considered as an implementation of the account type attribute in Table 1. In order to identify organization accounts, we leverage two open source solutions: one is called Humanizr and the other called M3-Inference. The Humanizr looks into tweets of a user along with user information in the tweets to determine whether the account in question belongs to an individual person or represents an organization, while M3-Inference uses the profile image, name, screen name, and the bio of a user, as already stated in "Related literature" section. In case the two solutions return different outcomes for the same account, in other words, one solution classifies as an organization account, whereas the other does as an individual account, we consider a user to be an organization account when at least one of the two says it is an organization. Otherwise, the account is considered an individual account. In our data, approximately 22% (11,195 out of 51,276) of the accounts turn out to be organization accounts, which is higher than 9.4%. We remove those organization accounts, which leaves 40,081 users who are believed to be individual accounts.
For gender identification, we utilize a Python library called gender-guesser, which employs a statistical approach to gender classification by considering the first name of a person, as well as the M3-Inference solution already used for the account type. The gender-guesser solution returns one of the six classes: "unknown", "androgynous", "male", "female", "mostly_male", or "mostly_female". Here, we merge "mostly_male" into "male" and "mostly_female" into "female", for simplicity. As mentioned in "User profiling" section, a User object has the name field that allows users to specify their name. As not all users provide their exact full name, it is possible that there is no first name in the field. Furthermore, even if there is the first name specified by the user, there is no guarantee that the first name is recognized by the solution, which is especially true for non-English names. The M3-Inference solution returns either "female" or "male" for a user. In order to merge the outcomes from the two solutions, we (1) label the users as "conflict" when one solution returns "female" and the other "male" and (2) label the user as the one predicted by the second solution when the first solution returns "unknown" or "androgynous" and the second solution returns "male" or "female". This results in 24,886 females, 13,910 males, and 1285 conflicts. We disregard the conflicts in our data.
For the age attribute, we continue to rely on the M3-Inference solution, which returns for each user one of the four age levels: ≤18, (18, 30), [30, 40), [40, 99). From our data, the solution results in 6,011 users for 18 or under, 12,994 for 19 to 29, 10,641 for 30 to 39, and 10,435 for 40 or above.
Rank |
Hashtag |
Frequency |
Rank |
Hashtag |
Frequency |
---|---|---|---|---|---|
1 |
#poshmark |
4,993,200 |
26 |
#fitness |
19,223 |
2 |
#shopmycloset |
3,748,873 |
27 |
#nature |
19,137 |
3 |
#fashion |
2,351,297 |
28 |
#model |
18,601 |
4 |
#style |
1,569,501 |
29 |
#nyc |
18,409 |
5 |
#giveaway |
79,435 |
30 |
#summer |
18,025 |
6 |
#love |
73,497 |
31 |
#quote |
17,928 |
7 |
#etsy |
71,360 |
32 |
#tbt |
17,591 |
8 |
#win |
67,961 |
33 |
#blog |
17,575 |
9 |
#shehnaazgill |
57,871 |
34 |
#shopping |
17,510 |
10 |
#beauty |
54,519 |
35 |
#sidharthshukla |
17,205 |
11 |
#handmade |
48,495 |
36 |
#design |
16,366 |
12 |
#art |
39,531 |
37 |
#life |
16,261 |
13 |
#vintage |
36,432 |
38 |
#gifts |
16,178 |
14 |
#jewelry |
31,311 |
39 |
#sale |
16,084 |
15 |
#ad |
31,195 |
40 |
#covid19 |
16,066 |
16 |
#ootd |
29,427 |
41 |
#sweepstakes |
16,019 |
17 |
#beautiful |
28,299 |
42 |
#android |
15,903 |
18 |
#photography |
27,432 |
43 |
#food |
15,695 |
19 |
#travel |
25,743 |
44 |
#mayward |
15,661 |
20 |
#christmas |
24,971 |
45 |
#androidgames |
15,294 |
21 |
#makeup |
24,902 |
46 |
#cute |
15,289 |
22 |
#music |
22,309 |
47 |
#health |
15,187 |
23 |
#ebay |
21,732 |
48 |
#sexy |
14,926 |
24 |
#gameinsight |
21,027 |
49 |
#tiktok |
14,921 |
25 |
#repost |
20,442 |
50 |
#contest |
14,897 |
Rank |
User |
Frequency |
Rank |
User |
Frequency |
---|---|---|---|---|---|
1 |
@poshmarkapp |
4,917,306 |
26 |
@jeffreestar |
12,816 |
2 |
@ebay |
194,975 |
27 |
@sidharth_shukla |
12,591 |
3 |
@youtube |
141,356 |
28 |
@rubidilaik |
12,017 |
4 |
@etsy |
89,344 |
29 |
@potus |
12,014 |
5 |
@realdonaldtrump |
54,010 |
30 |
@hwanniepromotes |
11,672 |
6 |
@ishehnaaz_gill |
48,226 |
31 |
@ladyincrypto |
10,052 |
7 |
@missufe |
33,847 |
32 |
@weareoneexo |
9932 |
8 |
@chitaglorya__ |
29,150 |
33 |
@barackobama |
9855 |
9 |
@bts_twt |
28,034 |
34 |
@originalfunko |
9700 |
10 |
@maymayentrata07 |
27,304 |
35 |
@gemhostofficial |
9549 |
11 |
@bloglovin |
20,669 |
36 |
@colorstv |
9385 |
12 |
@zazzle |
18,945 |
37 |
@nytimes |
8983 |
13 |
@pledis_17 |
18,372 |
38 |
@taylorswift13 |
8809 |
14 |
@joebiden |
17,717 |
39 |
@cashapp |
8526 |
15 |
@pulte |
17,515 |
40 |
@shill_ronin |
8336 |
16 |
@blackpink |
16,611 |
41 |
@bang_garr |
8062 |
17 |
@eyehinakhan |
16,395 |
42 |
@prctiu |
7762 |
18 |
@sof1azara03 |
16,147 |
43 |
@influenster |
7589 |
19 |
@davelackie |
14,343 |
44 |
@elonmusk |
7452 |
20 |
@fineartamerica |
14,292 |
45 |
@perduechicken |
7404 |
21 |
@etsysocial |
14,251 |
46 |
@netflix |
7366 |
22 |
@barber_edward_ |
14,115 |
47 |
@colourpopco |
7242 |
23 |
@cnn |
13,872 |
48 |
@thesecret |
7191 |
24 |
@amazon |
13,285 |
49 |
@kamalaharris |
7187 |
25 |
@giveawayhost |
13,275 |
50 |
@taegiveaway |
7171 |
Regarding the influential actors, we take two approaches. The first one is to simply identify what user accounts are mentioned the most in the tweets, which can be considered to be the popular users in this virtual community. Table 5 presents the top-50 popular user mention ranking from the tweets posted by the same young female users interested in fashion. The user @poshmarkapp is the most mentioned user account, which confirms that shopping on Poshmark is very popular. Note that not all the user accounts listed on this ranking match the young female users in our target audience. They are just the user accounts that were mentioned very frequently by them, some of whom can be outside the target audience.
Rank |
User |
Centrality |
User |
H-Index |
---|---|---|---|---|
1 |
@jacquelinerline |
0.124 |
@makeupbyshaniah |
191 |
2 |
@ofresell |
0.105 |
@nikkitamboli |
177 |
3 |
@captaincouture1 |
0.099 |
@c**********s |
174 |
4 |
@heliapichardo |
0.098 |
@m********x |
171 |
5 |
@bethpaintings |
0.098 |
@josinaanderson |
161 |
6 |
@katewinstyle |
0.097 |
@alissawahid |
156 |
7 |
@trixie8181 |
0.095 |
@janeyellene |
140 |
8 |
@pinkpretty16 |
0.094 |
@salmahayek |
140 |
9 |
@lashea_hudnall |
0.094 |
@g*************1 |
137 |
10 |
@amyposhboutique |
0.091 |
@rubidilaikofc |
135 |
11 |
@msmaverick2 |
0.09 |
@megastyleph |
133 |
12 |
@micely6391 |
0.088 |
@maliibumiitch |
123 |
13 |
@peanutandjojos |
0.088 |
@ari_maj1 |
118 |
14 |
@chelleztreasure |
0.088 |
@nikkisamonas |
116 |
15 |
@emmasattic98 |
0.088 |
@rubiholiccs |
114 |
16 |
@suzcat12 |
0.087 |
@emilykschrader |
112 |
17 |
@jazziesposhmark |
0.087 |
@famnikki |
111 |
18 |
@poshmarkrebekah |
0.086 |
@ivy_ferguson |
108 |
19 |
@lifesshortbuyit |
0.085 |
@s*************s |
107 |
20 |
@shadowdogdesign |
0.08 |
@sayyess2thejess |
105 |
21 |
@rendon_patsy |
0.077 |
@aquiboni |
102 |
22 |
@krista47005550 |
0.076 |
@life_breakdown |
102 |
23 |
@boondockfinds |
0.075 |
@shivandi |
98 |
24 |
@voudaux |
0.075 |
@hinakhanstan |
96 |
25 |
@michelleroseg33 |
0.073 |
@a************o |
93 |
Table 6 presents the top-25 influential user ranking sorted by centrality (left side) and h-index (right side), respectively, in descending order. The number one user on the centrality ranking is Jacqueline Line (screen name @JacquelineRLine), who has 367K followers at the time of writing, is a popular user on Poshmark, and her timeline is filled with tweets on various fashion items. On the other hand, the number one user on the retweet h-index ranking is Shaniah (screen name @makeupbyshaniah), who has 115.4K followers at the time of writing, is a popular makeup artist and YouTuber. As shown in the table, the two influencer rankings present completely different users, which implies that the two measures exhibit different perspectives on influence.
It is worth further analyzing this case study from a perspective of the Total Twitter Error framework mentioned in "Introduction" section, which helps us to evaluate potential errors in the study. As the study completely relies on the pool of random Twitter users and tweets to identify people interested in fashion, it is not free from the under-coverage error. In other words, it is obvious that the Twitter users found never represent all people in the world interested in fashion. Here, we make a strong assumption that we are only interested in Twitter users and our study is only targeted at those people in a social media world. We do not believe that this assumption is unreasonable, as we are well-aware that many people interested in fashion are using Twitter and having conversation in the cyberspace. Again, this should completely depend on the objectives of the study. On the other hand, the 16,011 young female users found are never small as a sample, as it would be challenging to gather this number of human subjects or respondents in traditional surveys. In addition, we identified and removed organization accounts, which definitely helped to reduce the over-coverage error in our data. In terms of the query error, while we could have added other hashtags than just #fashion and #style when identifying users interested in fashion, we believe that the two hashtags alone are representative of the interest in fashion. Lastly, there is room for the interpretation error, given that the user profiling solutions used are imperfect. In order to minimize the potential interpretation error, we (1) chose the solutions that demonstrate good performances in their papers and also (2) used more than one solutions for the same attribute whenever possible.
Me Too movement reaction: conservatives vs. liberals
The second case study aims to answer the question of whether the political orientation, i.e., conservative vs. liberal, affects people's reaction to a gender-related issue. We choose the recent Me Too movement as one of the noticeable gender-related topics and attempt to compare how differently conservatives and liberals react to the same issue. To define the target audience for this case study, we take the same approach as the one used in the previous case study on young women interested in fashion: identifying the Twitter users in our pool who have ever used the #metoo hashtag in their tweets. Again, mentioning the hashtag is assumed to be their interest in the topic. From our pool, 68,116 users are identified as those who (1) have ever used the #metoo hashtag, (2) still have valid and public accounts on Twitter, and (3) have posted at least 100 tweets. Formally, |U|=68116. We then collect up to 3200 recent tweets for each of the users, which totals 188,806,239 tweets, or formally |T|=188806239.
Rank |
Hashtag |
Frequency |
Rank |
Hashtag |
Frequency |
---|---|---|---|---|---|
1 |
#covid19 |
12,753 |
26 |
#imwithher |
2541 |
2 |
#trump |
10,706 |
27 |
#strongertogether |
2524 |
3 |
#resist |
6515 |
28 |
#biden2020 |
2502 |
4 |
#maga |
6223 |
29 |
#trumpvirus |
2382 |
5 |
#fbrparty |
5979 |
30 |
#tiktok |
2292 |
6 |
#bidenharris2020 |
5941 |
31 |
#trump2020 |
2266 |
7 |
#potus |
5796 |
32 |
#resisters |
2262 |
8 |
#fbr |
5274 |
33 |
#buildbackbetter |
2248 |
9 |
#backfiretrump |
4943 |
34 |
#votebluetosaveamerica |
2200 |
10 |
#vote |
4915 |
35 |
#florida |
2165 |
11 |
#breaking |
4684 |
36 |
#traitortrump |
2161 |
12 |
#fbi |
4564 |
37 |
#lockhimup |
2157 |
13 |
#theresistance |
4061 |
38 |
#trumpcrimefamily |
2153 |
14 |
#moscowmitch |
3801 |
39 |
#poshmark |
2133 |
15 |
#coronavirus |
3752 |
40 |
#biden |
2083 |
16 |
#mitchplease |
3429 |
41 |
#trumprussia |
2067 |
17 |
#gop |
3157 |
42 |
#auschwitz |
1954 |
18 |
#blacklivesmatter |
3073 |
43 |
#scotus |
1904 |
19 |
#smartnews |
2826 |
44 |
#demdebate |
1895 |
20 |
#voteblue |
2770 |
45 |
#giveaway |
1854 |
21 |
#newprofilepic |
2706 |
46 |
#resistance |
1840 |
22 |
#demvoice1 |
2631 |
47 |
#georgia |
1834 |
23 |
#covid |
2591 |
48 |
#texas |
1826 |
24 |
#gh |
2570 |
49 |
#txlege |
1815 |
25 |
#impeachtrump |
2546 |
50 |
#sotu |
1777 |


Rank |
Feature |
Importance |
Rank |
Feature |
Importance |
---|---|---|---|---|---|
1 |
#trump2020 |
0.042 |
26 |
#fbrparty |
0.008 |
2 |
#fjb |
0.038 |
27 |
#trumpshutdown |
0.008 |
3 |
#moscowmitch |
0.034 |
28 |
#impeachtrump |
0.008 |
4 |
#traitortrump |
0.029 |
29 |
#neverforgetjanuary6th |
0.008 |
5 |
#oann |
0.026 |
30 |
#deathsantis |
0.007 |
6 |
#resist |
0.021 |
31 |
#expeljoshhawley |
0.007 |
7 |
#bidenharris2020 |
0.020 |
32 |
#daytona500 |
0.007 |
8 |
#americafirst |
0.019 |
33 |
#fbi |
0.006 |
9 |
#voteblue |
0.017 |
34 |
#prolife |
0.006 |
10 |
#2a |
0.015 |
35 |
#wearamask |
0.006 |
11 |
#bidenharris |
0.015 |
36 |
#trump2024 |
0.006 |
12 |
#istandwithbiden |
0.015 |
37 |
#covid19 |
0.006 |
13 |
#demvoice1 |
0.014 |
38 |
#proudboys |
0.006 |
14 |
#mitchplease |
0.014 |
39 |
#laurenboebertissodumb |
0.005 |
15 |
#getvaccinated |
0.012 |
40 |
#resisters |
0.005 |
16 |
#buildbackbetter |
0.012 |
41 |
#trumpvirus |
0.005 |
17 |
#forthepeople |
0.011 |
42 |
#votebluetosaveamerica |
0.005 |
18 |
#theresistance |
0.011 |
43 |
#morningjoe |
0.005 |
19 |
#godblessamerica |
0.011 |
44 |
#strongertogether |
0.005 |
20 |
#walkaway |
0.011 |
45 |
#lockhimup |
0.005 |
21 |
#trumpisnotwell |
0.010 |
46 |
#americasgreatestmistake |
0.005 |
22 |
#antifa |
0.010 |
47 |
#trumpcare |
0.005 |
23 |
#maddow |
0.010 |
48 |
#holocaustremembranceday |
0.005 |
24 |
#arresttrumpnow |
0.010 |
49 |
#trumprussia |
0.005 |
25 |
#backtheblue |
0.009 |
50 |
#maga2020 |
0.005 |
For model evaluation and selection, we compare the f1-scores, which are the harmonic means of precision and recall. As shown in Fig. 2, the Random Forest model yields the best performance with the f1-score of 0.91, which can be considered a very high accuracy for prediction. Figure 3 presents the Average Precision (AP) curve (left) and the Receiver Operating Characteristic (ROC) curve for the best performing Random Forest model. The Average Precision and Area Under the Curve (AUC) are 0.96 and 0.96, respectively, which confirm the excellent performance of the model. In addition, in order to identify which features (i.e., hashtags) contribute the most to prediction, we list the feature importance scores provided by the Random Forest algorithm. Table 8 presents the top-50 important features and their importance scores. The ranking shows that the #trump2020 hashtag contributes the most in terms of political orientation prediction, followed by #fjb, #moscowmitch, #traitortrump, #oann (meaning One America News Network), #resist, #bidenharris2020, and so on, which all make sense.
As the training data used for political orientation classification are biased toward the users who clearly described themselves as proud liberal/conservative, we further conduct out-of-sample performance evaluation. To create a new data set for out-of-sample evaluation, we randomly select 200 users whose bio has "democrat" or "liberal" with no "proud" and, likewise, 200 users whose bio has "republican" or "conservative" with no "proud". Next, for each of the group of 200 users, we manually check if the user is actually liberal or conservative by reading their bio, which results in 179 liberal users and 116 conservative users. We then collect up to 3200 most recent tweets from their timelines and extract hashtag frequency features from their tweets. We then apply our political orientation classifier to those users and predict their political orientations. Finally, we compare their predicted political orientations with their actual ones. This results in an f1-score of 0.76. While this performance is lower than the with-in sample performance of 0.91, which is fully expected, the performance is still high enough to be used in real-world Big Data analysis.
In order to prove that hashtag features outperform full-text features in political orientation classification, we utilize BERT as the baseline approach to compare, which is known to perform well in text classification. To clarify, our approach uses the frequencies of top-1000 popular hashtags as features, whereas BERT uses the full text of aggregated tweets of users as features for transfer learning. The f1-score we achieve from BERT is 0.61, which is far lower than 0.91 from the best-performing hashtag-based model. Our guess is that the full text of a user's tweets has too much noise that does not help in identifying their political orientation, whereas hashtags serve as surprisingly good indicators.
Rank |
Liberals |
Conservatives |
||
---|---|---|---|---|
Hashtag |
Frequency |
Hashtag |
Frequency |
|
1 |
#metooindia |
2497 |
#timesup |
2151 |
2 |
#timesup |
1909 |
#metooindia |
1579 |
3 |
#metoogr |
1314 |
#blm |
865 |
4 |
#ge |
1042 |
#occupy |
741 |
5 |
#firstthem |
787 |
#metoogr |
712 |
6 |
#metooincest |
729 |
#believewomen |
706 |
7 |
#metooinceste |
666 |
#ibelievetarareade |
679 |
8 |
#india |
529 |
#daca |
662 |
9 |
#veterans |
498 |
#demexit |
652 |
10 |
#rape |
455 |
#union |
650 |
11 |
#believewomen |
432 |
#oligarchs |
650 |
12 |
#metoounlessitsbiden |
419 |
#megabanks |
650 |
13 |
#domesticviolence |
368 |
#corpmedia |
650 |
14 |
#rapeculture |
358 |
#nodapl |
650 |
15 |
#tarareade |
342 |
#sdf |
650 |
16 |
#saraheverard |
322 |
#humanity |
649 |
17 |
#sexualassault |
313 |
#idiocracy |
638 |
18 |
#doctorsaredickheads |
291 |
#ibelievetara |
605 |
19 |
#weasourselves |
286 |
#timesupbiden |
478 |
20 |
#blacklivesmatter |
281 |
#maketellingsafe |
473 |
21 |
#mentoo |
278 |
#csa |
469 |
22 |
#silenceisviolence |
275 |
#dropoutbiden |
469 |
23 |
#doctorsabusetoo |
270 |
#metoounlessitsbiden |
445 |
24 |
#blm |
265 |
#firstthem |
437 |
25 |
#patientchoice |
262 |
#mentoo |
407 |
26 |
#nursesabusetoo |
262 |
#dropbiden |
373 |
27 |
#metoocy |
259 |
#feminism |
366 |
28 |
#anopensecret |
252 |
#tarnishedbadge |
363 |
29 |
#believeallwomen |
246 |
#auspol |
334 |
30 |
#justiceforjohnnydepp |
242 |
#whyididntreport |
318 |
31 |
#ibelievetarareade |
232 |
#blacklivesmatter |
282 |
32 |
#violenceagainstwomen |
229 |
#women |
280 |
33 |
#churchtoo |
219 |
#bjp |
274 |
34 |
#joebiden |
214 |
#kobebryant |
266 |
35 |
#h1news |
193 |
#koberip |
264 |
36 |
#women |
188 |
#feminist |
259 |
37 |
#sexualharassment |
187 |
#believesurvivors |
258 |
38 |
#feminism |
185 |
#joebidenisarapist |
244 |
39 |
#ibelievetara |
182 |
#biden |
241 |
40 |
#metoomovement |
173 |
#feminismiscancer |
240 |
41 |
#patientdignity |
173 |
#bringbernieback |
238 |
42 |
#notallmen |
165 |
#endviolenceagainstwomen |
235 |
43 |
#covid19 |
164 |
#justice |
233 |
44 |
#unstucklife |
164 |
#survivors |
227 |
45 |
#china |
163 |
#covid19 |
226 |
46 |
#hr |
154 |
#neverbiden |
222 |
47 |
#awareness |
151 |
#book |
205 |
48 |
#survivor |
149 |
#survivor |
204 |
49 |
#biden |
144 |
#london |
200 |
50 |
#anuragkashyap |
143 |
#brexit |
199 |
We now proceed with the final step for comparing the views on the Me Too movement by political orientation. We compare the most popular hashtags that co-occur with the #metoo hashtag in the same tweet, based on the idea that there would be differences between liberals' interests and conservatives' interests in the same Me Too context. Table 9 presents the top-50 popular hashtag rankings from the tweets posted by liberals and by conservatives, respectively. Note that, while this table only shows the 50 most popular hashtags, there are much more hashtags following those top-50 hashtags.
In order to measure how different the two entire rankings are, we employ two measures: the cosine similarity and the rank correlation. For the cosine similarity measure, specifically, we transform each entire ranking into a vector of hashtag frequencies and then calculate the cosine similarity between the two vectors, which indicates the angle between the two vectors. The smaller the angle, the more similar the two vectors are. Cosine similarity ranges between 0 and 1, where being close to 1 means very similar and being close 0 means dissimilar. From the two hashtag ranking vectors, we get the cosine similarity of 0.65. For the second rank correlation coefficient measure, we calculate both the Spearman correlation coefficient and the Kendall correlation coefficient on the two entire rankings. A rank correlation coefficient ranges from −1 and 1, where being close to 1 indicates a positive correlation, being close −1 a negative correlation, and being close to 0 no correlation. We achieve −0.24 and −0.23, respectively, which are both closer to 0 than to 1 or −1. The cosine similarity and the rank correlation coefficients indicate the dissimilarity of the two rankings, which implies that the two groups' interests are not the same.


We now evaluate potential errors in this case study from a Total Twitter Error perspective. As with the first case study, this study relies on the pool of random Twitter users and tweets to identify people interested in the Me Too movement, and thus the same argument holds for this study: we assume that the set of 68,116 Twitter users found is sufficient for the study. In terms of the query error, we believe that the #metoo hashtag is the one and only hashtag we can think of and is representative of the interest in the Me Too movement, although there is a possibility that some users did not use the #metoo hashtag in their tweets. In this case, one may consider searching for any other expressions than just hashtags in tweet text that represent Me Too. Lastly, given the very high accuracy of our political orientation classifier, we believe that there is not much room for the interpretation error caused by customized profiling.