Building a Sinhala-English Parallel Corpus for Neural Machine Translation Based on Exam Questions
View/ Open
Date
2021Author
Rilfi, MRM
Gunawansha, UGYM
Prasandika, KAC
Chandrani, KGA
Metadata
Show full item recordAbstract
In any neural machine translation
between two natural languages, parallel corpus is
a compulsory part of the training process. The
most crucial step in an MT system is to develop
an effective method for gathering parallel corpus.
The construction of a parallel corpus, on the
other hand, necessitates substantial knowledge
of both languages and is a time-consuming
procedure. Due to these limits, digitizing
documents becomes extremely challenging,
lowering the quality of machine translation
systems. This research offers a method for
producing an English to Sinhala parallel corpus
that is both faster and more efficient, while
requiring less human intervention. This system
generates a parallel corpus for language pair
using the following steps: scanning the exam
question papers using a special type of scanner,
Image optimization for Optical Character
Recognition (OCR), text extraction from images
and converting unstructured text into structured
form as parallel corpus.
Collections
- Computing [62]