Monday 27 May 2013

Web Page Scraping using Java

In this blog, we are going to learn about web scraping fundamentals and implementation of web scraper using Java API.

Agenda of this post

    What is Web Scraping
    Web Scraping technique
    Useful API for web scraping
    Sample code using java API

 Web scraping (also called Web harvesting or Web data extraction) is a technique of extracting information from websites.
It describes any of various means to extract content from a website over HTTP for the purpose of transforming that content into another format suitable for use in another context.
Using web scraper, you can extract the useful content from the web page and convert into any format as applicable.

Web Scraping technique:
These are few steps suggested for web scraping:

    Connect : Connect with the remote site over HTTP or FTP.
    Extract : Extract information from the website
    Process : Filter useful data from source and format data in useful format
    Save : Save data in desired format. 

 There are different web scraping software and APIs available. I am going to use web-harvest for my web scrapping example.

Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them.



  1. hi
    Thanks for the valuable information. i appreciate your time and effort.

  2. hey nice content ! This article was very informative. in my opinion,Big Data has potential to help organizations or companies to improve their growth rate and enable them to take potential decision. So scraping data from the web can really help the organizations to improvise their operations.

    Web Parsing

  3. Genesis Technologies is one of the best IT company in Indore. We have developed a product accounting software development which is completely best in it's environment.

  4. its very nice article. thanks for sharing such great article hope keep sharing such kind of article Web data scraper

  5. Very useful stuff…thanks for writing and sharing such an informative article. Try Web data Scraper tool to extract data from websites.