Multilingual Racial Hate Speech Detection Using Transfer Learning

145
Not scheduled
20m
Von-Melle-Park 4

Von-Melle-Park 4

Poster

Description

The rise of social media eases the spread of hateful content, especially racist content with severe consequences. In this paper, we analyze the tweets written in French targeting the death of George Floyd in May 2020 as the event accelerated debates on racism globally. Using the Yandex Toloka platform, we annotate the tweets into categories as hate, offensive, or normal. Tweets that are offensive or hateful are further annotated as racial or non-racial. We build French hate speech detection models based on the multilingual BERT and CamemBERT and apply transfer learning by fine-tuning the HateXplain model. We compare different approaches to resolve annotation ties and find that the detection model based on CamemBERT yields the best results in our experiments.

Keywords

Racial hate speech, offensive speech, transfer learning, Toloka

Find me @ my poster 2: Monday afternoon

Primary authors

Abinew Ali Ayele (Language Technilogy Group, University of Hamburg, Hamburg, Germany) Prof. Chris Biemann (Language Technilogy Group, University of Hamburg, Hamburg, Germany) Dr Seid Muhie Yimam (House of Computing and Data Sceince) Ms Skadi Dinter (Language Technilogy Group, University of Hamburg, Hamburg, Germany)

Presentation materials