I’ve played fantasy football since high school but the last 4 years, it’s gotten more serious. My friend Nick invited me to join a dynasty league, meaning we build teams and keep players over time. Prior to that I was fairly casual in my draft research, cramming a couple of articles in to inform my picks on the day of the draft. After the first year, I realized I needed to up my game. The app we use provides draft rankings based on projected scoring but I was looking for an edge. I could’ve sought an undercover expert but being an engineer, I decided to dive straight to the data. There was only one problem - finding it.
There are a lot of websites that provide fantasy football scoring and stats but I was looking for something more foundational. I wanted a database of stats that I could query directly to determine my own insights. After some digging, I found the nflverse-data GitHub repository which aggregates all of the player stats, calculates fantasy scores, tracks rosters and more. There is a python package, nfl-data-py which provides programmatic access to the data. Using the examples in their ReadMe, I was able to start fetching the data and inserting it into a SQLite database.
import sqlite3
import nfl_data_py as nfl
con = sqlite3.connect("data.db")
def to_sql(data, table_name):
data.to_sql(name=table_name, con=con, if_exists="replace")
years = ["2023", "2022", "2021", "2020", "2019", "2018", "2017", "2016"]
weekly_data = nfl.import_weekly_data(years)
to_sql(weekly_data, "stats_by_game")
# Team information
teams = nfl.import_team_desc()
to_sql(teams, "teams")
# Get player information
players = nfl.import_ids()
to_sql(players, "players")
# Get depth charts
depth_chart = nfl.import_depth_charts(years)
to_sql(depth_chart,"depth_chart")
# Get snap counts
snap_counts = nfl.import_snap_counts(years)
to_sql(snap_counts, "snap_counts")
To explore the data, I used the excellent Datasette tool, which allowed me to easily explore the data using a web interface to the SQLite database. That helped me understand the column types, see subsets of existing data, and see how the tables could be linked together. I noticed all of the tables referred to specific players by the gsis_id
field, so I updated the players table to use that column as the ID for easier querying.
Here are a couple of queries I used find the highest scoring players on a per game and per season basis:
Using this data, I was able to find players who consistently performed well, filter out keepers so I knew who would be available, and in 2022 I won the league. Alas 2023 ended with an early playoff exit, so I don’t have everything figured out yet. I’m pleased with the results but am still looking for a further edge. If you're interested in seeing the full code, it's available here.