Show simple item record

dc.contributor.authorDe S Sirisuriya, SCM
dc.date.accessioned2018-05-18T15:44:12Z
dc.date.available2018-05-18T15:44:12Z
dc.date.issued2015
dc.identifier.urihttp://ir.kdu.ac.lk/handle/345/1051
dc.descriptionArticle Full Texten_US
dc.description.abstractThe World Wide Web contains all kinds of information of different origins; some of those are social, financial, security and academic. Most people access information through internet for educational purposes. Information on the web is available in different formats and through different access interfaces. Therefore, indexing or semantic processing of the data through websites could be cumbersome. Web Scraping is the technique which aims to address this issue. Web scraping is used to transform unstructured data on the web into structured data that can be stored and analysed in a central local database or spreadsheet. There are various web scraping techniques including Traditional copy-andpaste, Text grapping and regular expression matching, HTTP programming, HTML parsing, DOM parsing, Webscraping software, Vertical aggregation platforms, Semantic annotation recognizing and Computer vision web-page analysers. Traditional copy and paste is the basic and tiresome web scraping technique where people need to scrap lots of datasets. Web scraping software is the easiest scraping technique since all the other techniques except traditional copy and paste require some form of technical expertise. There are hundreds of web scraping software available today, most of them designed by using Java, Python and Ruby. There are also some open source web scraping software and as well as commercial software. Web scraping software such as YahooPipes, Google Web Scrapers and Outwit Firefox extensions are the best tools for beginners in web scraping. This study focused on giving comparative clarification about web scraping techniques and famous web scraping software. To accomplish this, we compare and contrast several web scraping techniques and some famous web scraping software. The outcome of this study offers a review on web scraping techniques and software which can be used to extract data from educational web sites.en_US
dc.language.isoen_USen_US
dc.subjectWeb Scrapingen_US
dc.subjectInformation Extractionen_US
dc.titleA Comparative Study on Web Scrapingen_US
dc.typeArticle Full Texten_US
dcterms.bibliographicCitationDe S Sirisuriya, S. C. M. (2015) ‘A Comparative Study on Web Scraping’, in Proceedings of 8th International Research Conference of KDU. General Sir John Kotelawala Defence University, pp. 135–140. Available at: http://ir.kdu.ac.lk/handle/345/1051.
dc.identifier.journalKDU IRCen_US
dc.identifier.pgnos135-140en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record