Monday, 27 May 2013

Web Page Scraping using Java

In this blog, we are going to learn about web scraping fundamentals and implementation of web scraper using Java API.

Agenda of this post

    What is Web Scraping
    Web Scraping technique
    Useful API for web scraping
    Sample code using java API



 Web scraping (also called Web harvesting or Web data extraction) is a technique of extracting information from websites.
It describes any of various means to extract content from a website over HTTP for the purpose of transforming that content into another format suitable for use in another context.
Using web scraper, you can extract the useful content from the web page and convert into any format as applicable.

Web Scraping technique:
These are few steps suggested for web scraping:

    Connect : Connect with the remote site over HTTP or FTP.
    Extract : Extract information from the website
    Process : Filter useful data from source and format data in useful format
    Save : Save data in desired format. 

 There are different web scraping software and APIs available. I am going to use web-harvest for my web scrapping example.

Web-Harvest
Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them.

Source: http://half-wit4u.blogspot.in/2011/01/web-scraping-using-java-api.html

9 comments:

  1. hi
    Thanks for the valuable information. i appreciate your time and effort.

    http://www.loginworks.com/

    ReplyDelete
  2. hey nice content ! This article was very informative. in my opinion,Big Data has potential to help organizations or companies to improve their growth rate and enable them to take potential decision. So scraping data from the web can really help the organizations to improvise their operations.

    Web Parsing

    ReplyDelete
  3. Web Scraping Company provides web scraping, data scraping, website scraping, web data extraction, big data service, big data solution and data mining services. We provides any kind of data from any online web resource.

    ReplyDelete
  4. Genesis Technologies is one of the best IT company in Indore. We have developed a product accounting software development which is completely best in it's environment.

    ReplyDelete
  5. Web Scraping Services or website scraping service is like a boon to grow business and reach your business to new heights and success. Website scraping services is nothing but a process of extracting data from website for your business need.

    ReplyDelete
  6. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java Online Training India . Nowadays Java has tons of job opportunities on various vertical industry.

    ReplyDelete
  7. its very nice article. thanks for sharing such great article hope keep sharing such kind of article Web data scraper

    ReplyDelete
  8. Very useful stuff…thanks for writing and sharing such an informative article. Try Web data Scraper tool to extract data from websites.

    ReplyDelete