Monday, 22 December 2014

Scraping table from html web with CloudStat

You need to use the data from internet, but don’t type, you can just extract or scrape them if you know the web URL.

Thanks to XML package from R. It provides amazing readHTMLtable() function.

For a study case,

I want to scrape data:

    US Airline Customer Score.
    World Top Chess Players (Men).

A. Scraping US Airline Customer Score table from

http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines

Code:

airline = ‘http://www.theacsi.org/index.php?option=com_content&view=article&id=147&catid=&Itemid=212&i=Airlines’

airline.table = readHTMLTable(airline, header=T, which=1,stringsAsFactors=F)

Result:

B. Scraping World Top Chess players (Men) table from http://ratings.fide.com/top.phtml?list=men

Code:

chess = ‘http://ratings.fide.com/top.phtml?list=men’

chess.table = readHTMLTable(chess, header=T, which=5,stringsAsFactors=F)

Result:

Done. You had successfully scraping data from any web page with CloudStat.

You can get the full version of this study case (code and result) at Scraping table from html web.

Then, you can analyze as usual! Great! No more retype the data. Enjoy!

Source:http://www.r-bloggers.com/scraping-table-from-html-web-with-cloudstat/

No comments:

Post a Comment