ISU Hackathon (ESPN Sucks....)

I remember when ESPN.com used to be my beloved hero. I could look up anything about how well a player was playing, sports analysis with intelligent people, or even the dumb things players did in their free time. All of that was in the palm of my hands. However, on one frightful day espn.com finally unveiled that it was laying low. I tried to click on a link to learn more about a player and then all of a sudden it sent me to a link in which I had to pay to be an "Insider". I became very distraught, and angered that the supposedly free ESPN would forsake me.

In order to combat this great trial, my friends and I decided that we could do advanced statistics (specifically basketball) on our own; moreover, we could actually do it better than ESPN. First, in order to generate any useful statistics there must be data to work with. My team and I decided that it would be cool (and funny) to use a web scraper to collect the data from the espn.com website. The web scraper was written in python and it implements the BeautifulSoup library. A snipet of the code is displayed below:

Web Scraper Portion

In [4]:
from bs4 import BeautifulSoup 
from urllib.request import urlopen, HTTPError 
"""Web Scrapper retrieves basic statistics 
of key NBA Players"""

def check_open(url):
    """Function that checks if the base webpage opened or not"""
    try: 
        urlopen(url)
        print("The webpage has opened")
        return "Go"
    except HTTPError as e: 
        return None 
        print("The webpage was not opened")

def steph_stats(base_url, stat_list):
    """Function that gets stat of Steph"""
    #Opens url string and assigns to html
    html = urlopen(base_url)
    #Creates Beautiful Soup object
    bs_obj = BeautifulSoup(html, "lxml")
    num = bs_obj.findAll(("tr", {"class","oddrow"}))
    new_list = [] 
    for x in num:
        #runs through x items in list
        new_string = x.get_text() + " "
        print(new_string)
        stat_list.append(new_string)
    # This specific string deals with the data that we
    # wanted to extract specifically.
    # It is then appened to the list and the list 
    # is returned 
    new_list.append(stat_list[21])
    return new_list


def steph_create(data):
    """A function that computes data for Stephen Curry"""

    # Splits strings into respective variables
    split_data = data[0]
    year = split_data[1:7]
    team = split_data[7:9]

    fg_made = split_data[9:12]
    fg_attempted = split_data[13:16+1]
    fg_pec = split_data[16+1:20+1]
    """Not all of the code will be shown since there is over 900 lines of code"""

After the program was executed it created the text files that contained the basic statistics of each of the players that we were examining. The text files were then inserted into a C++ program to compute the advanced player statistics and team statistics. Here is a snippet of the C++ code that executed the aforementioned.

Advanced Statistics C++ Code

#include #include #include #include using namespace std; double calcTS(double a, double b, double c) { double x; x = 100 * a/(2 * (b + (.44 * c))); x = floor(x * 100.00 + 0.5) / 100.00; return x; } double calcEFG(double a, double b, double c) { double x; x = 100 * (a + (0.5 * b))/c; x = floor(x * 100.00 + 0.5) / 100.00; return x; } double calcTOV(double a, double b, double c) { double x; x = (100 * a)/(b + (.44 * c) + a); x = floor(x * 100.00 + 0.5) / 100.00; return x; } int main() { int x; //used for continuing the do while loop //declare ints for keeping track of array index and size int statsSize = 21; //declare array to hold the player statistics double stats [statsSize]; int teamSize = 14; //declare array to hold team stats double team [teamSize]; do{ int statsIndex = 0; //initialized here for the case we do multiple players int teamIndex = 0; /* There is more but it will not be included*/

After this program was executed the advanced player statistic and advanced team statistics was generated.

CSV Data Matrix

In [5]:
# advanced player data in visual form (orginally a csv file)
adv_player_data = """
Player, True Shooting Percentage, Effective FG Percentage,Turnover Percent, Assist to Turnover Ratio, Free Throw Rate, 3 Pointer Rate, 2 Pointer Rate
Stephen Curry, 67.64, 63.25, 13.11, 1.98, 0.28, 0.55, 0.45
James Harden, 59.37, 50.1, 15.98, 1.57, 0.54, 0.42, 0.58
Kevin Durant, 63.69, 57.36, 12.57, 1.45, 0.38, 0.33, 0.67
Demarcus Cousins, 53.68, 47.39, 12.99, 0.85, 0.5, 0.17, 0.83
Lebron James, 57.19, 53.27, 12.27 ,2.15, 0.36, 0.2, 0.8    
"""

After this the team conducted comparative statistics. Here were some of our findings:

Code for Infograph

In [6]:
"""Bokeh graph to display advanced statistics comparisions of players"""
from collections import OrderedDict
from math import log, sqrt 

import numpy as np
import pandas 
from six.moves import cStringIO as StringIO

from bokeh.plotting import figure, show
from bokeh.io import output_notebook


stat_color = OrderedDict([
    ("True Shooting Percentage", "orange"),
    ("Effective FG Percentage", "black"),
    ("Turnover Percent", "#c64787"),
    ("Assist to Turnover Ratio", "purple"),
    ("Free Throw Rate", "yellow"),
    ("3 Pointer Rate", "green"),
    ("2 Pointer Rate", "red"),
])

player_color = {
    "Stephen Curry" : "#aeaeb8",
    "James Harden" : "#e69584",
    "Kevin Durant" : "indigo",
    "Demarcus Cousins" : "#0d3362",
    "Lebron James" : "#c64737",
}

data = pandas.read_csv(StringIO(adv_player_data),
                    skiprows = 1,
                    skipinitialspace=True,
                    engine="python")


width = 800
height = 800
inner_radius = 120 
outer_radius = 350 - 10

minr = sqrt(log(.001 * 1E4))
maxr = sqrt(log(1000000 * 1E4))
a = (outer_radius - inner_radius) / (minr - maxr)
b = inner_radius - a * maxr 

def rad(mic):
    return a * np.sqrt(np.log(mic * 1E4)) + b 

big_angle = 2.0 * np.pi / (len(data))
small_angle = big_angle / 7

p = figure(plot_width=width, plot_height=height, title="Top Scorers Comparison",
    x_axis_type=None, y_axis_type=None,
    x_range=(-420, 420), y_range = (-420,420),
    min_border=0, outline_line_color="black",
    background_fill="#f0e1d2", border_fill="#f0e1d2"
)

p.xgrid.grid_line_color= None
p.ygrid.grid_line_color= None 

#annular weges
angles = np.pi/2 - big_angle - data.index.to_series()*big_angle
colors = [player_color[player] for player in data.Player]
p.annular_wedge(
    0, 0, inner_radius, outer_radius, -big_angle+angles, angles, color=colors
)

#small wedges 
p.annular_wedge(0, 0, inner_radius, rad(data["True Shooting Percentage"]),
                -big_angle+angles+5*small_angle, -big_angle+angles+7*small_angle,
                color=stat_color['True Shooting Percentage'])
p.annular_wedge(0, 0, inner_radius, rad(data['Effective FG Percentage']),
                -big_angle+angles+3*small_angle, -big_angle+angles+6*small_angle,
                color=stat_color['Effective FG Percentage'])

p.annular_wedge(0, 0, inner_radius, rad(data['Turnover Percent']),
                -big_angle+angles+1*small_angle, -big_angle+angles+5*small_angle,
                color=stat_color['Turnover Percent'])

p.annular_wedge(0, 0, inner_radius, rad(data['Assist to Turnover Ratio']),
                -big_angle+angles+1*small_angle, -big_angle+angles+4*small_angle,
                color=stat_color['Assist to Turnover Ratio'])

p.annular_wedge(0, 0, inner_radius, rad(data['Free Throw Rate']),
                -big_angle+angles+(2)*small_angle, -big_angle+angles+3*small_angle,
                color=stat_color['Free Throw Rate'])

p.annular_wedge(0, 0, inner_radius, rad(data['3 Pointer Rate']),
                -big_angle+angles+(1)*small_angle, -big_angle+angles+2*small_angle,
                color=stat_color['3 Pointer Rate'])

p.annular_wedge(0, 0, inner_radius, rad(data['2 Pointer Rate']),
                -big_angle+angles+(.10)*small_angle, -big_angle+angles+(1)*small_angle,
                color=stat_color['2 Pointer Rate'])




# circular axes and lables
labels = np.power(10.0, np.arange(-3, 4))
radii = a * np.sqrt(np.log(labels * 1E4)) + b
p.circle(0, 0, radius=radii, fill_color=None, line_color="white")
p.text(0, radii[:-1], [str(r) for r in labels[:-1]],
       text_font_size="8pt", text_align="center", text_baseline="middle")

p.circle([-40, -40, -40, -40, -40], [-370, -390, -410, -430,-450], color=list(player_color.values()), radius=5)
p.text([-30, -30, -30,-30,-30], [-370, -390,-410,-430,-450], text=["Player: " + gr for gr in player_color.keys()],
       text_font_size="7pt", text_align="left", text_baseline="middle")

p.rect([-70, -70, -70,-70,-70,-70,-70], [68,48,28,8,-12,-32,-52], width=30, height=13,
       color=list(stat_color.values()))
p.text([-44, -40, -40,-40,-40,-40,-40], [68,48,28,8,-12,-32,-52], text=list(stat_color),
       text_font_size="9pt", text_align="left", text_baseline="middle")



output_notebook()
BokehJS successfully loaded.
In [7]:
show(p)

What is interesting to note from this interactive graph is that these players are very similar in terms of their True Shooting Percentage and their Effective FG Percentage. What differentiates these players is that their Turnover Percent and Assist to Turnover Ratio are different. The player with the best Percent and Assist to Turnover Ratio would be Stephen Curry. Let us examine Steph Curry a little bit more.

Stephen Curry and Team Comparision Statistics

In [8]:
from numpy import pi
"""Generated from web scraper. Used literal for sake of time"""
steph_points = 1520
team_points = 6109

percents = [0, (steph_points/team_points), (steph_points/team_points)+.45, 1]
starts = [p*2*pi for p in percents[:-1]]
ends = [p*2*pi for p in percents[1:]]

# a color for each pie piece
colors = ["blue", "#FFA503","#FFA517"]

p1 = figure(x_range=(-1,1), y_range=(-1,1), title="Steph versus Team")

p1.wedge(x=0, y=0, radius=1, start_angle=starts, end_angle=ends, color=colors)
p1.line(x=[0, -.135, (-.135+-.135),(-.135+-.135+-.135)], y=[0,-.41, (-.41+-.41), (-.41+-.41+-.41)], line_dash=[4,4])
p1.text([0],[-0.35], text=["AST%"],
       text_font_size="56pt", text_align="left", text_baseline="middle", color="black")

p1.text([-.135+-.135+-.135],[-.41+-.41+-.21], text=["AST%"],
       text_font_size="15pt", text_align="left", text_baseline="middle", color="black")

#p1.m
# display/save everything 
output_notebook()
show(p1)
BokehJS successfully loaded.

Conclusion

As it can be seen below Steph Curry contributes 25 percent of all points for his team. If you have factor in his Assist percentage (denoted by AST%) it can be seen that Steph Curry contributes over half of his teams total scoring (since assists are direct reflection of scoring since it is a pass that leads to a score). Thank you so much for viewing our hardwork! We hope to add a machine learning section below in the future.