Matt Kaye
  • Blog
    • Posts
    • Series
  • Code
  • More
    • My three favorite
    • Going places
    • Blogroll
Categories
All (27)
R (12)
a/b testing (3)
analytics engineering (1)
data science (23)
dbt (1)
food + bev (1)
hiring (1)
mlops (1)
python (3)
sports analytics (2)
statistics (5)

On Doing Data Science

What does a data science project actually take?
data science
This post is part of a series called The Missing Semester of Your DS Education.
Aug 3, 2023
10 min

Unit Testing dbt Models

Letting yourself sleep at night by ensuring your SQL is correct
dbt
analytics engineering
python
Our team adopted dbt about a year ago, and it’s become an integral part of our data stack. dbt is a major component of the so-called “modern data stack” and has exploded onto the…
Jul 9, 2023
26 min

Lessons Learned From Running R in Production

And why I probably won’t be doing it again
R
data science
A couple weeks ago, I wrote a high-level post on REST APIs. One thing that I noted was that I couldn’t, in good faith…
Jun 29, 2023
31 min

How Can Someone Else Use My Model?

APIs, REST, and Production Services
data science
R
python
This post is part of a series called The Missing Semester of Your DS Education.
Jun 21, 2023
10 min

A Gentle Introduction to Docker

I need to run my code somewhere other than my machine. How do I do it?
R
data science
This post is part of a series called The Missing Semester of Your DS Education.
Jun 6, 2023
12 min

Dependency Management

You should be using poetry, or renv, or conda, or something similar
data science
R
python
This post is part of a series called The Missing Semester of Your DS Education.
May 28, 2023
10 min

Experiment Tracking and Model Versioning

Knowing what you’ve tried, and what’s in production
data science
This post is part of a series called The Missing Semester of Your DS Education.
May 14, 2023
9 min

Workflow Orchestration

A Beginner’s Guide to Shutting Down Your Machine at Night
data science
This post is part of a series called The Missing Semester of Your DS Education.
May 6, 2023
8 min

Pull Requests, Code Review, and The Art of Requesting Changes

R
data science
This post is part of a series called The Missing Semester of Your DS Education.
Apr 24, 2023
12 min

Unit Testing Analytics Code

How do you know your code actually works?
R
data science
This post is part of a series called The Missing Semester of Your DS Education.
Apr 5, 2023
14 min

Writing Internal Libraries for Analytics Work

Or packages, or modules, or whatever you wish to call them
R
data science
This post is part of a series called The Missing Semester of Your DS Education.
Apr 1, 2023
14 min

Balancing Classes in Classification Problems

And why it’s generally a bad idea
R
data science
In my last post I wrote about common classifications metrics and, especially, calibration.
Apr 1, 2023
10 min

Calibration and Evaluating Classification Models

Metrics for probability predictions
data science
There’s something of a complexity trajectory in evaluating classification models that…
Mar 20, 2023
8 min

Interpreting AUC-ROC

data science
R
AUC goes by many names: AUC, AUC-ROC, ROC-AUC, the area under the curve, and so on. It’s an extremely important metric for evaluating machine…
Mar 9, 2023
10 min

Exploring the Tail Behavior of ESPN’s Win Probability Model

data science
statistics
sports analytics
It’s College Football Playoff season, which means I’ve been watching a lot of games lately. And I find myself complaining pretty often about how badly calibrated I think…
Jan 9, 2023
9 min

Sequential Testing

data science
statistics
The last post proposed a solution to the multiple testing problem that often invalidates A/B test results test planning. The idea is to calculate the sample sizes you need…
Apr 17, 2022
5 min

Calling A/B Tests

data science
statistics
a/b testing
In the last post, I gave a bird’s eye level overview of the mechanics of running an A/B test. But at the end, we reached a problem: We had two conversion rates – 20% and 25%…
Apr 10, 2022
7 min

Running A/B Tests

data science
statistics
a/b testing
This is the second post in a series on A/B testing. In the last post, I gave a high-level…
Apr 9, 2022
5 min

A/B Testing: A Primer

data science
statistics
a/b testing
This is the first post in a series I’m planning on writing on A/B testing. In this post, I’ll lay out a top-level overview of what A/B testing is and why companies do it. In…
Mar 25, 2022
4 min

Deploying MLFlow on Heroku (with Heroku Postgres, S3, and nginx for basic auth)

mlops
Disclaimer: I followed this guide to setting up…
Dec 26, 2021
7 min

Notes on Hiring Data Analysts + Scientists

data science
hiring
In the past four months, I’ve been involved in hiring for two new roles at CollegeVine: a second data scientist, and our first data analyst. I’ve learned a lot along the way: Things that work, things that don’t, and things to ask in order to maximize the…
Dec 22, 2021
14 min

Working With Your Fitbit Data in R

R
data science
fitbitr 0.1.0 is now available on CRAN! You can install it with
Jun 8, 2021
6 min

Highlights From rstudio::global

R
data science
rstudio::global, this year’s iteration of the annual RStudio conference, was a few weeks ago. Here were some highlights:
Feb 11, 2021
2 min

What’s New in slackr 2.1.0

R
slackr 2.1.0+ is live! There are a whole bunch of exciting changes that we (mostly Andrie de Vries and I) have made to improve the package a bunch.
Feb 7, 2021
2 min

A Gentle Introduction to Markov Chains and MCMC

data science
Every other Friday at work we have a meeting called All Hands. During the first half of All Hands a member of the team gives a presentation, which is split up into two…
Jan 13, 2021
10 min

What I’ve Been Drinking in Quarantine (Plus a Cocktail Lesson)

food + bev
The past ten-or-so months of limited activity due to Covid-19 have…
Jan 10, 2021
15 min

Our 2021 Big Data Bowl Submission

data science
sports analytics
This post is my team’s 2021 NFL Big Data Bowl submission. My team was made up of me, Hugh McCreery (Baltimore Orioles), John Edwards (Seattle Mariners), and Owen McGrattan…
Jan 7, 2021
15 min
No matching items
    Matt Kaye, 2023
    Cookie Preferences
    Built with Quarto and ❤