Wednesday 28 September 2016

Scraping Yelp Business Data With Python Scraping Script

Scraping Yelp Business Data With Python Scraping Script

Yelp is a great source of business contact information with details like address, postal code, contact information; website addresses etc. that other site like Google Maps just does not. Yelp also provides reviews about the particular business. The yelp business database can be useful for telemarketing, email marketing and lead generation.

Are you looking for yelp business details database? Are you looking for scraping data from yelp website/business directory? Are you looking for yelp screen scraping software? Are you looking for scraping the business contact information from the online Yelp? Then you are at the right place.

Here I am going to discuss how to scrape yelp data for lead generation and email marketing. I have made a simple and straight forward yelp data scraping script in python that can scrape data from yelp website. You can use this yelp scraper script absolutely free.

I have used urllib, BeautifulSoup packages. Urllib package to make http request and parsed the HTML using BeautifulSoup, used Threads to make the scraping faster.
Yelp Scraping Python Script

import urllib
from bs4 import BeautifulSoup
import re
from threading import Thread

#List of yelp urls to scrape
url=['http://www.yelp.com/biz/liman-fisch-restaurant-hamburg','http://www.yelp.com/biz/casa-franco-caramba-hamburg','http://www.yelp.com/biz/o-ren-ishii-hamburg','http://www.yelp.com/biz/gastwerk-hotel-hamburg-hamburg-2','http://www.yelp.com/biz/superbude-hamburg-2','http://www.yelp.com/biz/hotel-hafen-hamburg-hamburg','http://www.yelp.com/biz/hamburg-marriott-hotel-hamburg','http://www.yelp.com/biz/yoho-hamburg']

i=0
#function that will do actual scraping job
def scrape(ur):

          html = urllib.urlopen(ur).read()
          soup = BeautifulSoup(html)

      title = soup.find('h1',itemprop="name")
          saddress = soup.find('span',itemprop="streetAddress")
          postalcode = soup.find('span',itemprop="postalCode")
          print title.text
          print saddress.text
          print postalcode.text
          print "-------------------"

threadlist = []

#making threads
while i<len(url):
          t = Thread(target=scrape,args=(url[i],))
          t.start()
          threadlist.append(t)
          i=i+1

for b in threadlist:
          b.join()

Recently I had worked for one German company and did yelp scraping project for them and delivered data as per their requirement. If you looking for scraping data from business directories like yelp then send me your requirement and I will get back to you with sample.

Source: http://webdata-scraping.com/scraping-yelp-business-data-python-scraping-script/

Monday 19 September 2016

Run Code Template – New Feature Added to Fminer Web Scraping Tool

Run Code Template – New Feature Added to Fminer Web Scraping Tool

Fminer is one of the powerful web scraping software, I already given brief of all the Fminer features in previous post. In this post I am going to introduce one of the interesting feature of fminer which is Run Code Template that is recently added to Fminer, this feature is similar to “Fminer Run Code” action but it’s different in a way you can use it. The Run Code Action you can use inside the data scraping flow and python code get executed when scraper start running.

While Run Code Templates are the saved python code snippets that you can run on the data tables after scraping completes. Assume if you get white space in scraped data then you can easily trim this left and right spaces by just executing “strip_column” template, see the code of that template below.

'''Strip all data of a column in data table
Remove the blank of data in the head and the tail.
'''

tabName = '[%table1|data table%]'
colName = '[%table1.column1|table column for strip%]'

tab = tables[tabName]
for i, row in enumerate(tab):
    row[colName] = row[colName].strip()   
    tab.edit_row(i, row)

This template comes with Fminer and few other template like “merge_tables_with_same_columns”.  Below are the steps how you can execute template python code on scraped data.

Step 1: Click on second icon from right that says “Run Code” under the Data section

Step 2: One popup will appear, you need to click on “Templates” icon and choose the template you want to execute and then click on Ok.

Step 3: Now the window will appear for configuration that will ask you to choose the table and column under that table on which you want to execute the code. Now click on Ok again.

Step 4: Now you can see the code of that template, now you can click on execute icon and script will start running, based on number of records it will take time to finish execution.

In many web scraping projects I found this template code very handy for cleaning data and making life easy. Templates are stored at following path so you can create your own template with customized code.

C:\Program Files (x86)\FMiner\templates

I have created one template which I use to remove HTML code that comes while scraping badly organized HTML pages. Below is the code of template for stripping html:

'''Strip HTML will remove all html tags of a column in data table.
'''
import re
tabName = '[%table1|data table%]'
colName = '[%table1.column1|table column for substring%]'
colNew = '[%table1.column1|table column to add new data%]'
tab = tables[tabName]
for i, row in enumerate(tab):
    cleanr =re.compile('<.*?>')
    cleantext = re.sub(cleanr,'', row[colName])
    row[colNew] = cleantext 
    tab.edit_row(i, row)

Stay connected as I am going to post more code templates that will make your web scraping life easy and manipulate data on fly.

Source: http://webdata-scraping.com/run-code-template-new-feature-added-fminer-web-scraping-tool/

Tuesday 6 September 2016

Calculate your ROI on Web Scraping using our ROI Calculator

Calculate your ROI on Web Scraping using our ROI Calculator

Staying atop the competition is a vital thing for the survival and growth of businesses these days. Ever since big data came into the picture, web scraping has become something businesses from every industry has to invest in. If your company is not in a technically advanced industry, web scraping could even be a nightmare to start with. Wondering if going with in-house web scraping is right for you? In house or outsourcing, in the end it’s all about the returns on investment.

ROI Calculator

Considering the numerous factors that determine how much web scraping can cost you, it’s not easy to calculate the ROI on your in-house web scraping.

In house web scraping is certainly a challenging process. If you plan on going down this way, here is a brief list of prerequisites.

Engineers

Technically skilled labour is an essential requirement for web scraping. Since, web scraping techniques are complicated, it needs good programming skills to write, run and maintain the scraping bots. The cost of labour can be one of the drawbacks with doing in house web scraping.

Hardware Resources

Web scraping is a resource hungry process which requires high end servers and lots of bandwidth. Without the adequate resources, you might end up losing important data. The cost of quality servers could easily make you want to reconsider doing web scraping on your own. Not to mention the doubling up of these resources in order to keep the data intact, espcially if you’re looking at large scale.

Maintainability and ukeep of your tech stack

Once you have your servers and other technical components setup, the real deal only starts. You have to ensure availability of your servers, data backups, restoring previous states, failovers, among many other complications associated with managing servers and fixing them up when something goes wrong. You need to allocate resources (both people and hardware) to take care of the above.

Time

Time is something that we cannot really include in the equation when it comes to calculating the returns. But it is definitely a factor that defines if web scraping in house is worth it. Although web scraping is the fastest way to acquire data, the initial setup and maintenance are time consuming and complicated. This could easily lead to conflicts when you have to distribute your time between web scraping and other business activities that are crucial for your company.

Try the ROI Calculator

We came up with an ROI calculator to easily calculate your returns on investment with our web scraping services. Using this, you could easily compare the cost of in house web scraping with PromptCloud’s dedicated web scraping services. Find out how much you can save by going the PromptCloud way.

Source: https://www.promptcloud.com/blog/calculate-roi-on-web-scraping