How I Made a Machine Learning Application to Predict NCAA Tournament Basketball Games
March 23, 2025
- I paid $9.99 for Stephen Grider’s Udemy course Machine Learning in JavaScript on November 23, 2018. I completed the course over the span of a couples months and decided I didn’t really have a future with machine learning.
- In the summer of 2024, I went over the videos again, taking note of which applications you build in the course that would be translatable to predicting the outcomes of basketball games.
- In the fall of 2024, I took a logistic regression application from the course that could take a CSV file of car data and predict whether a given car would pass an emissions test and updated it to run on the latest versions of Node and Tensorflow.js.
- I entered the March Machine Learning Mania 2025 competition on Kaggle because entrants received historical NCAA basketball data.
- The data from the Kaggle competition was in the form of a series of CSV files. I used
db-migrate
to add the data that I thought I’d want to a SQLite database.
- I added a table to specifically store calculated data that I thought I’d want for my model, such as season averages for each team for each year. The machine learning functions expected data in a specific form, and it seemed like the training and predicting process would be simpler if I didn’t perform all historic calculations in the same step.
- I wrote a series of queries and mapping functions to feed data into my logistic regression functions in the expected format.
- Once I confirmed the whole application was working, I started the process of tweaking the model.
To summarize the main steps:
- I used an online tutorial to create a machine learning application that could take take a CSV file of car data and predict whether the car would pass an emissions test using logistic regression.
- I created a database of NCAA data.
- I fed the NCAA data into the machine learning application.
Sourcing data and getting it into a useable format was the most time consuming part of the process, outside of maybe learning just enough about machine learning to understand how this could all be done. I didn’t have a working NCAA tournament prediction algorithm until the Monday before March Madness started, so my time was limited to attempt to iterate on the model, and in retrospect, the final version seems like it will perform worse than the first would have. I’ll do an autopsy on the model when the tournament is over, but the brackets I submitted using it have been pretty bad.