@amanshakya

Efficient Word Embedding for Nepali

, and . Proceedings of 12th IOE Graduate Conference, 12, page 621 -- 627. Institute of Engineering, Tribhuvan University, Nepal, (October 2022)

Abstract

Word embedding are a vital part for most modern Natural Language Processing (NLP) task. It is however difficult to identify if the given word embedding model works well or not. Especially with larger models, the time taken to train such models are very large. When we add to this the time required to train the eventual model for NLP task, a very large chunk of time can be spent on just training the different word embedding models to identify which word embedding models works well. Due to this, selecting between different models of word embedding can be very difficult. For this, intrinsic evaluation is used to evaluate the performance of word embedding systems instead of directly using the model for eventual NLP task. But for Nepali, it is difficult due to the lack of resources in Nepali Language. We show that using intrinsic evaluation based from similar language like Hindi with small modifications, we can gain insight about the effectiveness of word embedding. It can be justified based on the result for extrinsic evaluation where in the results are in agreement with the results from intrinsic evaluation. Using this, we find out that among the 3 models considered, the fasttext model performs the best when considering out of vocabulary words.

Links and resources

Tags