I remember when ESPN.com used to be my beloved hero. I could look up anything about how well a player was playing, sports analysis with intelligent people, or even the dumb things players did in their free time. All of that was in the palm of my hands. However, on one frightful day espn.com finally unveiled that it was laying low. I tried to click on a link to learn more about a player and then all of a sudden it sent me to a link in which I had to pay to be an "Insider". I became very distraught, and angered that the supposedly free ESPN would forsake me.
In order to combat this great trial, my friends and I decided that we could do advanced statistics (specifically basketball) on our own; moreover, we could actually do it better than ESPN. First, in order to generate any useful statistics there must be data to work with. My team and I decided that it would be cool (and funny) to use a web scraper to collect the data from the espn.com website. The web scraper was written in python and it implements the BeautifulSoup library. A snipet of the code is displayed below:
from bs4 import BeautifulSoup
from urllib.request import urlopen, HTTPError
"""Web Scrapper retrieves basic statistics
of key NBA Players"""
def check_open(url):
"""Function that checks if the base webpage opened or not"""
try:
urlopen(url)
print("The webpage has opened")
return "Go"
except HTTPError as e:
return None
print("The webpage was not opened")
def steph_stats(base_url, stat_list):
"""Function that gets stat of Steph"""
#Opens url string and assigns to html
html = urlopen(base_url)
#Creates Beautiful Soup object
bs_obj = BeautifulSoup(html, "lxml")
num = bs_obj.findAll(("tr", {"class","oddrow"}))
new_list = []
for x in num:
#runs through x items in list
new_string = x.get_text() + " "
print(new_string)
stat_list.append(new_string)
# This specific string deals with the data that we
# wanted to extract specifically.
# It is then appened to the list and the list
# is returned
new_list.append(stat_list[21])
return new_list
def steph_create(data):
"""A function that computes data for Stephen Curry"""
# Splits strings into respective variables
split_data = data[0]
year = split_data[1:7]
team = split_data[7:9]
fg_made = split_data[9:12]
fg_attempted = split_data[13:16+1]
fg_pec = split_data[16+1:20+1]
"""Not all of the code will be shown since there is over 900 lines of code"""
After the program was executed it created the text files that contained the basic statistics of each of the players that we were examining. The text files were then inserted into a C++ program to compute the advanced player statistics and team statistics. Here is a snippet of the C++ code that executed the aforementioned.
After this program was executed the advanced player statistic and advanced team statistics was generated.
# advanced player data in visual form (orginally a csv file)
adv_player_data = """
Player, True Shooting Percentage, Effective FG Percentage,Turnover Percent, Assist to Turnover Ratio, Free Throw Rate, 3 Pointer Rate, 2 Pointer Rate
Stephen Curry, 67.64, 63.25, 13.11, 1.98, 0.28, 0.55, 0.45
James Harden, 59.37, 50.1, 15.98, 1.57, 0.54, 0.42, 0.58
Kevin Durant, 63.69, 57.36, 12.57, 1.45, 0.38, 0.33, 0.67
Demarcus Cousins, 53.68, 47.39, 12.99, 0.85, 0.5, 0.17, 0.83
Lebron James, 57.19, 53.27, 12.27 ,2.15, 0.36, 0.2, 0.8
"""
After this the team conducted comparative statistics. Here were some of our findings:
"""Bokeh graph to display advanced statistics comparisions of players"""
from collections import OrderedDict
from math import log, sqrt
import numpy as np
import pandas
from six.moves import cStringIO as StringIO
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
stat_color = OrderedDict([
("True Shooting Percentage", "orange"),
("Effective FG Percentage", "black"),
("Turnover Percent", "#c64787"),
("Assist to Turnover Ratio", "purple"),
("Free Throw Rate", "yellow"),
("3 Pointer Rate", "green"),
("2 Pointer Rate", "red"),
])
player_color = {
"Stephen Curry" : "#aeaeb8",
"James Harden" : "#e69584",
"Kevin Durant" : "indigo",
"Demarcus Cousins" : "#0d3362",
"Lebron James" : "#c64737",
}
data = pandas.read_csv(StringIO(adv_player_data),
skiprows = 1,
skipinitialspace=True,
engine="python")
width = 800
height = 800
inner_radius = 120
outer_radius = 350 - 10
minr = sqrt(log(.001 * 1E4))
maxr = sqrt(log(1000000 * 1E4))
a = (outer_radius - inner_radius) / (minr - maxr)
b = inner_radius - a * maxr
def rad(mic):
return a * np.sqrt(np.log(mic * 1E4)) + b
big_angle = 2.0 * np.pi / (len(data))
small_angle = big_angle / 7
p = figure(plot_width=width, plot_height=height, title="Top Scorers Comparison",
x_axis_type=None, y_axis_type=None,
x_range=(-420, 420), y_range = (-420,420),
min_border=0, outline_line_color="black",
background_fill="#f0e1d2", border_fill="#f0e1d2"
)
p.xgrid.grid_line_color= None
p.ygrid.grid_line_color= None
#annular weges
angles = np.pi/2 - big_angle - data.index.to_series()*big_angle
colors = [player_color[player] for player in data.Player]
p.annular_wedge(
0, 0, inner_radius, outer_radius, -big_angle+angles, angles, color=colors
)
#small wedges
p.annular_wedge(0, 0, inner_radius, rad(data["True Shooting Percentage"]),
-big_angle+angles+5*small_angle, -big_angle+angles+7*small_angle,
color=stat_color['True Shooting Percentage'])
p.annular_wedge(0, 0, inner_radius, rad(data['Effective FG Percentage']),
-big_angle+angles+3*small_angle, -big_angle+angles+6*small_angle,
color=stat_color['Effective FG Percentage'])
p.annular_wedge(0, 0, inner_radius, rad(data['Turnover Percent']),
-big_angle+angles+1*small_angle, -big_angle+angles+5*small_angle,
color=stat_color['Turnover Percent'])
p.annular_wedge(0, 0, inner_radius, rad(data['Assist to Turnover Ratio']),
-big_angle+angles+1*small_angle, -big_angle+angles+4*small_angle,
color=stat_color['Assist to Turnover Ratio'])
p.annular_wedge(0, 0, inner_radius, rad(data['Free Throw Rate']),
-big_angle+angles+(2)*small_angle, -big_angle+angles+3*small_angle,
color=stat_color['Free Throw Rate'])
p.annular_wedge(0, 0, inner_radius, rad(data['3 Pointer Rate']),
-big_angle+angles+(1)*small_angle, -big_angle+angles+2*small_angle,
color=stat_color['3 Pointer Rate'])
p.annular_wedge(0, 0, inner_radius, rad(data['2 Pointer Rate']),
-big_angle+angles+(.10)*small_angle, -big_angle+angles+(1)*small_angle,
color=stat_color['2 Pointer Rate'])
# circular axes and lables
labels = np.power(10.0, np.arange(-3, 4))
radii = a * np.sqrt(np.log(labels * 1E4)) + b
p.circle(0, 0, radius=radii, fill_color=None, line_color="white")
p.text(0, radii[:-1], [str(r) for r in labels[:-1]],
text_font_size="8pt", text_align="center", text_baseline="middle")
p.circle([-40, -40, -40, -40, -40], [-370, -390, -410, -430,-450], color=list(player_color.values()), radius=5)
p.text([-30, -30, -30,-30,-30], [-370, -390,-410,-430,-450], text=["Player: " + gr for gr in player_color.keys()],
text_font_size="7pt", text_align="left", text_baseline="middle")
p.rect([-70, -70, -70,-70,-70,-70,-70], [68,48,28,8,-12,-32,-52], width=30, height=13,
color=list(stat_color.values()))
p.text([-44, -40, -40,-40,-40,-40,-40], [68,48,28,8,-12,-32,-52], text=list(stat_color),
text_font_size="9pt", text_align="left", text_baseline="middle")
output_notebook()
show(p)
What is interesting to note from this interactive graph is that these players are very similar in terms of their True Shooting Percentage and their Effective FG Percentage. What differentiates these players is that their Turnover Percent and Assist to Turnover Ratio are different. The player with the best Percent and Assist to Turnover Ratio would be Stephen Curry. Let us examine Steph Curry a little bit more.
from numpy import pi
"""Generated from web scraper. Used literal for sake of time"""
steph_points = 1520
team_points = 6109
percents = [0, (steph_points/team_points), (steph_points/team_points)+.45, 1]
starts = [p*2*pi for p in percents[:-1]]
ends = [p*2*pi for p in percents[1:]]
# a color for each pie piece
colors = ["blue", "#FFA503","#FFA517"]
p1 = figure(x_range=(-1,1), y_range=(-1,1), title="Steph versus Team")
p1.wedge(x=0, y=0, radius=1, start_angle=starts, end_angle=ends, color=colors)
p1.line(x=[0, -.135, (-.135+-.135),(-.135+-.135+-.135)], y=[0,-.41, (-.41+-.41), (-.41+-.41+-.41)], line_dash=[4,4])
p1.text([0],[-0.35], text=["AST%"],
text_font_size="56pt", text_align="left", text_baseline="middle", color="black")
p1.text([-.135+-.135+-.135],[-.41+-.41+-.21], text=["AST%"],
text_font_size="15pt", text_align="left", text_baseline="middle", color="black")
#p1.m
# display/save everything
output_notebook()
show(p1)
As it can be seen below Steph Curry contributes 25 percent of all points for his team. If you have factor in his Assist percentage (denoted by AST%) it can be seen that Steph Curry contributes over half of his teams total scoring (since assists are direct reflection of scoring since it is a pass that leads to a score). Thank you so much for viewing our hardwork! We hope to add a machine learning section below in the future.