According to (Davidson et al., 2017: 1), hate speech is “language that is used to express hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group.” However, “[w]hat is considered a hate speech message might be influenced by aspects such as the domain of an utterance, its discourse context, as well as context consisting of co-occurring media objects (e.g. images, videos, audio), the exact time of posting and world events at this moment, identity of author and targeted recipient” (Schmidt and Wiegand, 2017).
Hate speech has seen a rising trend in social media so much that various social media companies have had to come together to start a crackdown on online hate speech (Fioretti, 2018). In 2017, politicians in many countries deployed social media to spread hate-filled agendas. Amnesty International’s latest annual report on the state of the world’s human rights documents a global rise in state-sponsored hate and chronicles the variety of ways governments and leaders are increasingly peddling hateful rhetoric and policies that seek to demonize already marginalized groups (AI Report, 2017).
President Trump’s transparently-hateful travel ban on citizens from half a dozen Muslim-majority countries was one of the most prominent examples. Another recent news showed Trump retweeted three inflammatory videos from a British far-right account rife with anti-Islamic content. The video showed purported Muslims assaulting people and smashing holy structures. This retweet created a pandemonium among Trump’s followers and showed his anti-Islamic propaganda (Landers, n.d.). Study shows that this form of Anti-Islamic (or any religion) hate speech had generated more hate crimes than terrorism (Maza, 2017). It is important to identify hate speech on social media before it outgrows the trending topic.
Let’s dive into some exploratory analysis to find more about the data that is readily available out there:
Crowd flower Dataset: The crowd flower dataset that we used was published as part of a research project by Davidson et. al. (2017) about Twitter hate speech analysis. From the corpus of 85 million tweets, they took a random sample of approximately 24000 tweets and manually labelled these datasets as either Hate Speech (labelled as 0), Offensive (labelled as 1), or Neither (labelled as 2). Each tweet in this dataset was labelled by between 3 to 10 crowd flower users and classified as either Hate Speech, Offensive, or Neither based on a ‘majority wins’ rule. After conducting a preliminary analysis of this dataset, we found that about 5.7% of its tweets were classified as Hate Speech, 77.4% of them were classified as Offensive, and about 16.8% of them were classified as Neither. This dataset was split up and used as the training and test sets for our NLP model.
The volunteers did not always agree, but often had reasonable majorities. The following figure shows the distribution of the ratio of the number of people who classified a tweet as Hate Speech to the total number of people that reviewed a tweet, for every tweet that was classified as hate speech by the majority wins rule. The summary statistics of this distribution was as follows
Minimum = 0.4444 | Maximum = 1 | Mean = 0.7280 | Median = 0.667 | Standard Deviation = 0.1305
Tweets from Trump Twitter Archive: We believed that to see a bigger picture of how hate speech spreads on twitter, it was essential to analyze tweets from both the public and an influential personality. President Donald Trump tweets regularly and often with vulgar and offensive language, and racist and sexist content. Given his position of power and visibility, his tweets are a critical case study for this research considering the spread of hate speech. We extracted a repository of Trump’s tweets from Trump Twitter Archive (https://www.trumptwitterarchive.com) which provided us with most of Donald Trump’s tweets from 2009 to 2018 to a total count of approximately 31000, all of which were unlabeled.
This dataset also had features such as date time, retweet count, and followers count that were crucial to answer our question surrounding hate speech and mean retweet counts.
On analyzing the distribution of Trump‘s retweet counts, we found that most of his tweets had either very low number of retweets or none. The latter case being suspicious, after probing into the matter, we found that around 1000 of his tweets were missing from the year 2017. We also found that there was no tweet data from the year 2011.
Trump’s retweet counts were highly skewed right. Many retweet-counts ranging widely up to almost 370,000 were recorded, with a mean of 2793, a median of 115, and a maximum of 369,530 retweets. Below is the log distribution.
We can see from the logged distribution that it is much closer to a normal distribution with a median of 5.03, mean 5.3, and standard deviation about 2.62 log-retweets.
Another analysis that we did was that of his number of followers over the years. The following figure shows the rise in the number of followers over time.
The flatness and exponential jump seen in this visualization seem to suggest either the potential rise of Trump’s followers due to strategic events (such as declaring Mike Pence as his running mate or winning the Presidential election) or it may be the result of the gaps in the data set.
Recent Tweets: We also collected recent twitter data by writing our own data scraping code and using the Standard Twitter Search API for doing the same. The restriction of the standard twitter API (since it is the free version), is that tweets from only the past seven days can be mined. We used tweets between March 1st to 5th, 2018 to a total of 4000 tweets. Several features could have been extracted using the twitter API such as the date-time, geotag, retweet count, followers count, twitter handle, retweet handle, etc., but we were only interested in the date-time and retweet count, so we only extracted those features aside from the tweet texts. For reason unknown, the extracted tweets were always exactly from the same time — 15:59 PST. So, this dataset consisted of almost a thousand tweets everyday between 1st and 5th March at 15:59 PST. This drastically reduced the scope of our study to what it is now and with the data having incomplete fidelity, the results of this study were also bound to be restricted. One way to overcome this major restriction was to purchase the Premium Search API, but without any funding this wasn’t possible for this team.
Method
We utilized the work done by Davidson et al. (2017) to classify the tweets that we gathered into the three categories: hate speech, offensive language, or neither. Their model employs the hatebase.org’s hate speech glossary that has been created with words recognized by online users. They shortlisted close to twenty-five thousand tweets and had them labelled by CrowdFlower (CF) workers into these three categories based on the strict guidelines provided. We employed multiple classification algorithms to figure out the best possible method (logistic regression with L2 regularization, random forest classifier, Support Vector Machines) by performing 5-fold cross validation with a hold 10% hold out set. We also trained several classifiers and found that a logistic regression model with L2 regularization had the highest performance in terms of precision, recall and F1-score.
For the data cleaning, we took all the tweets and ran them through several standard NLP techniques to extract features.
First, we removed the URL and tags since they are not relevant to the classification of a tweet as hate speech.
We then removed all the non-letter characters and numbers.
The next step was tokenization which broke down all the tweets to individual words and tokens. We also used lemmatization along with tokenization to reduce inflection forms. Lemmatization is a variant of stemming that relates words with similar roots and semantic meanings and normalizes them.
We used a part of speech tagger which assigned part of speech to each word.
We then removed all the stop words like “we”, “our” and “won’t”.
Next, we created a TF-IDF (Term frequency inverse document frequency) matrix which weighed all the words by the its relevance to the corpus of all our tweets.
We recognize that employing the mentioned classifier comes along with its own set of subjective biases when it comes to hate speech classification: people tend to identify racist and foul language as hates speech but tend to see sexist language as merely offensive. Our study is limited by the fact that the mentioned classifier misclassified 40% of hate speech which suggests that “the model is biased towards classifying tweets as less hateful or offensive than the human coders” (Davidson et al., 2017). We decided to employ the mentioned classifier as the work done by its creators is relatively current and we could verify the authenticity of the work as the creators come from reputed institutions. This model had accuracy levels as specified in the confusion matrix below:
The best performing model has an overall precision of 0.91, recall of 0.90 and F1 score of 0.90.
To test our hypothesis that hate speech tweets have more retweets, we ran multiple t-tests. The first test determined whether the number of retweets for hate speech is equal to that of non-hate speech for Trump’s tweets. The second test looked at the public tweets and tested if the number of retweets was different for the hate speech and neutral. Out of the 32794 Trump’s tweets only 124 were classified as hate speech by our model. From the 4000 recent tweets 246 were classified as hate speech and 3752 were merely offensive or neutral. This shows that there is a significant difference in proportion. Below we see the distribution of Trump’s retweets, comparing to those predicted to be hate speech versus from those not.
There are many outlying retweet counts ranging far above these that have been removed for ease of viewing. While the distribution of offensive or neutral tweet, retweets appears lower than hate speech, the average is greater at 2722 retweets for non-hate speech versus 2267 retweets for hate speech. We see the same trends more clearly in the log-retweets.
Next, we examine the distribution of recent retweets (both logged and non-transformed) for hate speech and non-hate speech:
In contrast to Trump’s, the retweets of recent tweets appear to follow very similar spreads when comparing those with hate speech to those with offensive or neutral content. Indeed, from the results of the t-tests, which we explain in the following section, we found that these data were insufficient for identifying a difference in number of retweets for hate speech and non-hate speech recent tweets.
Results and Discussion
The final multinomial logistic regression model which we used to predict the label (hate speech, offensive, or neither) for Trump’s and recent general tweets, had a precision score of 0.91, a recall score of 0.90 and a f1-score of 0.90. The model misclassified 40% of hate speech, with 30% of hate speech classified as offensive and 10% as neither. When it comes to offensive and neutral tweets the model classified only 5% of offensive tweets as hate speech and only 2% of neutral tweets as hate speech. So, the model was conservative with respect to the people’s labels.
We found no significant difference between the mean number of retweets of Trump’s neutral and hate speech tweets. With p-value of 0.848, we fail to reject the null hypothesis which states that there is no difference between the means. Therefore, we can’t accept the alternative hypothesis which states that the mean retweet count is greater for hate speech as compared to non-hate speech. We also found no significant difference between the number of retweets for recent tweets labeled hate speech and neutral. The p-value is 0.4847 and again we fail to reject the null hypothesis which states that there is no difference between the means. We then restate our null hypothesis to compare the log number or retweets for Trump’s and the recenttweets. After running the one tailed t-test on Trump’s tweets, we got a p-value of 9.055e-05. In this case, we reject the null hypothesis and accept the alternative which states that the mean of hate speech log retweets is statistically greater than that of non-hate speech. Furthermore, we did not find any significant difference between the log-retweets of recent tweets labeled as hate speech and neutral. The p-value was 0.367, the degrees of freedom was 190.38, and the sample size was 3998. If we collect a larger sample size, we may be able to see a statistically significant difference between the log-retweets of hate speech and neutral recent tweets.
We log transformed the retweet counts because they were not normally distributed. Figure 4 shows that the retweet count is heavily right skewed. This could be due to the many low number of retweets Trump received during this early years on the platform (2009–2012). After he gained popularity, his retweet counts were in tens of thousands. Log transforming the retweet counts makes the distribution nearly normal which allows us to run a t-test to compare the mean log retweets of hate speech and non-hate speech.
Conclusion
As per the analysis we performed on the logged retweets, we can conclude that the hate speech may in fact have more retweets that non-hate speech. Given this exploratory and predictive analysis of Trump’s tweets and those from March 1–5, 2018, we conclude that there likely is a greater number of retweets for hate speech than for merely offensive, and neutral language.
These findings are limited by the classifier we reproduced using the paper from Davidson et al. (2017), where they concluded that lexical methods are inaccurate in identifying contextual hate speech as they rely on the identification of offensive terms which may not be directly related to the hate speech. While distinguishing between hate speech and offensive language is challenging, and was largely unsuccessful in this case, similar studies had more success. Watanabe and Ohtsuki (2018) built models with semantic, semantic, unigram, and contextual pattern features. They achieved 78.4% accuracy in distinguishing “hateful, offensive, and clean” tweets. With further analysis and minimizing the error rate, we will be able to train a better model that can successfully classify and differentiate a hate speech from offensive text.
It would also be advised to account for spelling variations with word generalization using Brown Clustering, such as that used for hate speech detection in Xiang et al. (2012) and Zhong et al. (2016) which would help us reduce the sparseness of the DTMs and increase the term frequencies. Our work could inform a larger study which could aim at identifying the impact of hate speech generated from Twitter accounts with high number of followers and see if theyimpact the more general twitter trend. The said study would involve mining of all the tweets for any given interval and analyzing the overall trend with respect to tweets from an account with high number of followers. If there is a rise in hate speech because of Trump’s (or any popular figure’s) tweets, we can do a cyclical test to tell: this is an outlier or not. We could use our study to create a Twitter hate speech dashboard which would automate the methods with the possibility of mining Twitter data on a regular basis and display the live in-depth analysis.
Future work could be aimed at analyzing the propagation of fake news/dilution of facts/intentional false or inaccurate information represented in the form of traditional news media, aimed at misleading its readers. This assumes that the people who’re retweeting hate speech may be the same people who’re responsible for spreading fake news with hateful intentions. It may be interesting to study the correlation between hate speech and fake news, as it is perhaps easier to censor fake news than hate speech in the United States. Furthermore, future work could be aimed at analyzing the geopolitical implications of hate speech. It may be interesting to understand the influence of hate speech content on determining the outcome of elections or instigating violence against ethnic minorities by analyzing the impact of utilizing online social networks to spread hateful content in different geographic regions.
References:
Amir Ali, Prateek Tripathi, Proshonjit Mitra, Richard McGovern, Vineet Kulkarni. Twitter Hate Speech Identification, (2018), GitHub repository, https://github.com/Vinieechu/twitter-hate-speech-identification.git
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language, ICWSM ‘17.
Fioretti, J. (2018). Social media companies accelerate removals of online hate speech: EU. Brussels: Reuters.
Landers, E. (n.d.). Retrieved from https://www.cnn.com/2017/11/29/politics/donald-trump-retweet-jayda-fransen/index.html
Maza, C. (2017). Trump’s Speech Causes more Anti-Muslim Hate Crimes than Terrorism, Study Shows. Retrieved from http://www.newsweek.com/trump-speech-anti-muslim-hate-crime-terrorism-study-713905
Petulla, S., Kupperman, T., & Schneider, J. (2017, November 13). The number of hate crimes rose in 2016. Retrieved from CNN: https://www.cnn.com/2017/11/13/politics/hate-crimes-fbi-2016-rise/index.html
Report, A. I. (2017). The State of The World’s Human Rights. Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media (pp. 1–10).
Watanabe, H., Bouazizi, M., & Ohtsuki, T. (2018). Hate Speech on Twitter A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access, PP(99), 1–1. https://doi.org/10.1109/ACCESS.2018.2806394
Xiang, G., Fan, B., Wang, L., Hong, J., & Rose, C. (2012, October). Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. In Proceedings of the 21st ACM international conference on Information and knowledge management (pp. 1980–1984). ACM
Comments