Doc2vec to Measure Semantic Similarity between Verses of the Qur’an

Menwa Alshammeri, Jouf University and School of Computing, University of Leeds, UK
IQSA International Conference 2021 “Giorgio La Pira” Library, Palermo, Italy
Panel 7. Carriers of the Text and Readings. 2. The Qur’an in Light of Digital Humanities

NLP helps us perform a wide range of tasks to analyze and penetrate the knowledge with text data. Semantic similarity is one of the main tasks for many NLP applications. Semantic similarity analysis in natural language texts has recently gained a lot of attention. The semantic similarity task is computationally challenging since determining text relatedness does not depend solely on lexical matching methods; it goes beyond that. Hence, we use a recent breakthrough in feature embedding in our work, Doc2vec, enabling machine learning models to have an informative numerical representation of the input text. We exploit the distributed representation of text to capture the semantic properties of the verses of the Qur’an. Therefore, we transformed the Qur’an verses into a numerical form, which can be used as input to ML methods to study the semantic similarity between the text documents/ verses.