dc.description.abstract | Abstract: Document similarity is important
in different areas dealing with textual data
such as knowledge management,
information extraction, natural language
processing, and artificial intelligence. Several
methods are existing to calculate document
similarity. But the results of most approaches
are unsatisfactory because specific domain
and contextual similarity are not taken into
consideration. In this paper, a domain-based
similarity calculation method to calculate
document similarity is proposed by
integrating context, World Wide Web
(WWW), and WordNet Similarity. Context is
gathered by implementing a topic modeling
algorithm and generating a domain context.
There are many topic modeling algorithms
available and here Latent Dirichlet Allocation
(LDA) is used. The World Wide Web is used
to capturing the latest knowledge. The
method makes it possible to get a similarity
value to the words in different domains. The
quality of the obtained model is compared
and evaluated using human judgment to
ensure the accuracy of the calculation.
Results indicate the accuracy of the
calculation and the proposed model can
achieve the limitations of existing measures. | en_US |