Baseball + Data = Fun!
My tech utopia. I mean, really, if you are a baseball fan, you know statistics rule the game. To some it’s enough to watch the game, see who wins and loses, but to others, like myself, it’s not just about that: it’s about the numbers. And dear Lord, baseball is full of numbers!
Rodney Cline Carew
Lifetime .329 AVG - MLB Hall of Fame
So, I figured, since I want to teach you how to use Postgres, an Open-Source database, and one of the most, if not the most, widely used databases in the world, why not do it with baseball data?
In this blog, you are going to learn how to do a basic Postgres installation, how to load data into this database, how to manipulate the data, and finally how to create queries and procedures to generate the statistics used in baseball, from the simple batting average to the amazingly complicated WAR.
We will also, as you may have already guess, do a deep dive into baseball data itself. Although you can go nuts with what data to accumulate, and I think to some extent, baseball has gone too far, data is the backbone of everything. Therefore, the key to this whole blog, the thing I must thrilled about is the data we are using. Although I am still working out some issues with it, we are going to be using data that takes us down to not just the game level, but the inning, and in most cases pitch count. The data itself comes from a company called RetroSheet, who has made it its mission to collect, organized and provide as much baseball data as it can, free of charge. Yes, free.
If you are not a baseball fan, don’t worry. You will be able to learn a lot about databases and have fun with it. And who knows, maybe, you’ll learn to appreciate one of the biggest things baseball has going for it, its wealth of data, and the statistics and strategies that are driven by it.
For now, be ready. You can install Postgres in any laptop or desktop you have. Although I will be using a Mac on my end, once we get passed the installation process, it won’t make a difference what Operating System you want to use. We will be installing an IDE (Integrated Development Environment) which is the same for Macs or Windows, and Postgres works the same way underneath.
Alright, next time, we’ll start chatting about baseball data, the basics for now, and of course, we’ll install Postgres.
Until then, this is your friendly Cincinnati Reds fan and avid data guy,
Luis.