Building a Sinhala-English Parallel Corpus for Neural Machine Translation Based on Exam Questions

Rilfi, MRM; Gunawansha, UGYM; Prasandika, KAC; Chandrani, KGA

dc.contributor.author	Rilfi, MRM
dc.contributor.author	Gunawansha, UGYM
dc.contributor.author	Prasandika, KAC
dc.contributor.author	Chandrani, KGA
dc.date.accessioned	2021-12-27T06:07:57Z
dc.date.available	2021-12-27T06:07:57Z
dc.date.issued	2021
dc.identifier.uri	http://ir.kdu.ac.lk/handle/345/5248
dc.description.abstract	In any neural machine translation between two natural languages, parallel corpus is a compulsory part of the training process. The most crucial step in an MT system is to develop an effective method for gathering parallel corpus. The construction of a parallel corpus, on the other hand, necessitates substantial knowledge of both languages and is a time-consuming procedure. Due to these limits, digitizing documents becomes extremely challenging, lowering the quality of machine translation systems. This research offers a method for producing an English to Sinhala parallel corpus that is both faster and more efficient, while requiring less human intervention. This system generates a parallel corpus for language pair using the following steps: scanning the exam question papers using a special type of scanner, Image optimization for Optical Character Recognition (OCR), text extraction from images and converting unstructured text into structured form as parallel corpus.	en_US
dc.language.iso	en	en_US
dc.subject	parallel corpus	en_US
dc.subject	image optimization	en_US
dc.subject	text extraction	en_US
dc.subject	neural machine translation	en_US
dc.title	Building a Sinhala-English Parallel Corpus for Neural Machine Translation Based on Exam Questions	en_US
dc.type	Article Full Text	en_US
dc.identifier.journal	KDU IRC, 2021	en_US
dc.identifier.issue	Faculty of Computing	en_US
dc.identifier.pgnos	349-356	en_US

Files in this item

Name:: 40.pdf
Size:: 823.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computing [62]

Show simple item record