Dissertation Project

Master’s Thesis - Offensive Language Detection Natural Language Processing Machine Learning Deep Learning

Posting offensive or abusive content on social media has been a serious concern in recent years. This has created a lot of problems because of the huge popularity and usage of social media sites like Facebook and Twitter. The main motivation lies in the fact that our model will automate and accelerate the detection of the posted offensive content so as to facilitate the relevant actions and moderation of these offensive posts. We would be using the publicly available benchmark dataset OLID 2019 (Offensive Language Identication Dataset) for this research project. The scope of our work lies in predicting whether the tweet post is offensive or not. We contributed by making the training dataset balanced using the Random Under-sampling technique. We also performed a thorough comparative analysis of various Feature Extraction Mechanisms and the Model Building Algorithms. The final comparative analysis concluded that the best model came out to be Bidirectional Encoder Representation from Transformer (BERT). Our results outperform the previous work achieving the Macro F1 score of 0.82 on this OLID dataset. Finally, a real-time system could be deployed on various social media platforms to detect and analyze the offensive post content and taking the appropriate action in order to normalize the behaviour on these sites and society.

Click here to see my publication on ResearchGate

Sidharth Mehra

Dissertation Project