Python web scraping of IPL point table and graph plotting using python libraries - Python for Data Analytics

Python Programs | Python Tricks | Solution for problems | Data Cleaning | Data Science

Python web scraping of IPL point table and graph plotting using python libraries

 In this Example we try to scrap point table of IPL 2018 and try to store value into DataFrame in same format and plot a bar graph using matplotlib that will show each team's Won and Lost in session .

Here is how point table exactly looks like on cricbuzz website.


We try to extract points table along with header and Teams and store these values in DataFrame and we use the matplotlib library of python to plot a bar graph that will demonstrate the Won and lost count of team.

This code dynamically pull in data from cricbuzz so it doesn't matter whenever you execute the below code with Internet connection it will give you correct information. Also make sure that you install all necessary packages before execution of code.


from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests

page = requests.get("http://www.cricbuzz.com/cricket-series/2676/indian-premier-league-2018/points-table")

soup = BeautifulSoup(page.text)
#print(soup.prettify())

tbl = soup.find("table",class_="table cb-srs-pnts")
#print(tbl.prettify())

col_names = [x.get_text() for x in tbl.find_all('td',class_="cb-srs-pnts-th")]
col_names[5]='pts'
#print(col_names)

team_names = [x.get_text() for x in tbl.find_all('td',class_="cb-srs-pnts-name")]
#print(team_names)

pnt_tbl = [x.get_text() for x in tbl.find_all('td',class_="cb-srs-pnts-td")]
#print(pnt_tbl)

np_pnt_tbl = (np.array(pnt_tbl)).reshape(len(team_names),7)
np_pnt_tbl = np.delete(np_pnt_tbl,6,1)
np_pnt_tbl = np_pnt_tbl.astype(int)
#print(np_pnt_tbl)

consol_tbl = pd.DataFrame(np_pnt_tbl,index=team_names,columns=col_names)
consol_tbl.columns.name = "Teams"
print(consol_tbl)

team_abr = []

for team in team_names:
    short_form = ''
    for initial in team.split(' '):
       short_form = short_form + initial[0]
    team_abr.append(short_form)


title = 'IPL 2018 Number of match won by teams'
val_ticks = [1,2,3,4,5,6,7,8]
lost_ticks=[1.4,2.4,3.4,4.4,5.4,6.4,7.4,8.4]


plt.bar(val_ticks,np_pnt_tbl[:,1],width=0.4,color='g',alpha=0.6,label='Won')
plt.bar(lost_ticks,np_pnt_tbl[:,2],width=0.4,color='r',alpha=0.6,label='Lost')
plt.yticks(val_ticks)
plt.ylabel("Matches")
plt.xticks(val_ticks,team_names,rotation='vertical')
plt.grid(True)
plt.legend()
plt.title(title)

plt.show()


This screenshots or just for better understanding though the IPL status keep changing output will also change. For exact output copy the code above and try on your own once, it will be fun.
bar graph



point table

Note: I hope this will help you as a reference or as a motivation to extract some useful information that are available through out the Internet for use and help you in your good executions.

11 comments:

  1. i have data in the form of lists, dictionaries, how to print the scorecard in python?

    ReplyDelete
  2. I have read all the comments and suggestions posted by the visitors for this article are very fine,We will wait for your next article so only.Thanks! mobilecric

    ReplyDelete
  3. Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work. trafficize

    ReplyDelete
  4. Thanks for your insight for your fantastic posting. I’m exhilarated I have taken the time to see this. It is not enough; I will visit your site every day. email campaign with gmail

    ReplyDelete