Web Extraction Software

Goals of this project are design and implementation of an application which will allow efective data extraction from HTML pages. Emphasis is put on maximal utilization of existing XML technologies.

Resulting application is based on XQuery language, which is extended by options allowing to work with web pages and combines it with other technologies for searching for relevant parts in free text. At the same time, it allows the usage of XSLT language for transformation of data into the required form.

Application contains command-line, graphical and server interface, which is accompanied by user extension for Mozilla Firefox 3 web browser. Command-line interface allows the batch processing of queries whereas the graphical interface offers user friendly way of creating queries. Server interface then brings the possibility of using application as a part of other applications and solutions.




SourceForge.net Logo