Automatic Structured Data Extraction from a Webpage
Abstract
As much of the effort is being put in developing a fully automatic extraction method. To extract structured data from the Web many researchers propose different solutions. In this paper, we propose a method for automatically extracting structured data from a web page known as Extraction of Structured Data (ESD). When a structured webpage is requested from the web server, web server in turn queries for the information from the underlying database and then returns the information to the browser. Our method makes use of both structural and visual features of a web page to extract data records from a webpage because the structured data records are retrieved from some database and presented on web pages using some predetermined template. Also, Website developers present their data in such a way that humans easily and quickly figure out and distinguish each data record.