With only a few games left in the March Madness tournament, fans across the country are waiting with bated breath to see which team comes out on top, and not just die-hard basketball fans. The tournament has become all-inclusive for both fanatic and casual fans alike. In fact, more than 40 million Americans will fill out more than 70 million brackets this year. This increase in fan base has also increased the amount of information people are clamoring for. Player statistics, team histories, injuries, and more are all desired by millions of fans as they strive for the perfect bracket. Data analysis has actually taken over the March Madness tournament as the game behind the games, as it were. The collection, organization and final analysis of all the statistics and information found in every play on the basketball court is the magic behind a winning bracket.
These days every movement of a basketball player can be turned into a data point. To begin with, historical data has a very important place in current and future NCAA games. Many years of raw data for each team in the tournament, including win-loss records, home vs. away statistics, and even play-by-play data, needs to be gathered for easy distribution. In addition, present-day games are providing even more data to collect, right down to the second-by-second movements and efficiencies of the players. Court-side video cameras are used to track player speed, shot trajectory, and dribbles and passes, all at 25 times per second.
Organization & Storage
Once all the data is collected, it needs to be stored somewhere. The hardware responsible for storing all this data includes servers utilizing database software with features such as high availability. The sheer amount of data collected for March Madness also requires databases equipped to process large datasets quickly and efficiently. Databases like Hadoop or Cassandra are capable of serving terabytes of data to millions of March Madness fans without causing frustration over latency or data loss. In addition, highly-available cloud servers make it easy to scale up and spin up more servers as the amount of data grows.
Many fans choose the winners of their brackets through educated guesses based on team loyalty, gut feelings, or even mascots. For other fans though, data analysis is key. However, mere access to the above described terabytes of data doesn’t do anyone much good if the data isn’t usable. In order to create smarter and more accurate brackets, we first need to extract meaningful insights from incredibly large amounts of data. Bracket predicting services use data-driven algorithms to determine patterns and predict game results. Other sources allow fans to pick and choose the factors that matter most to them to finalize their brackets, ranging from player and team stats, game outcomes and locations, and league trends. This intensive number crunching in a relatively short amount of time for millions of people requires easily scalable and highly available servers.
The sheer amount of data supporting the annual tournament has heightened fan involvement and appreciation, but has also dangled the tantalizing prospect of a perfect bracket just out of reach. While the odds are slim (about 1 in 9.2 quintillion), it’s hard to resist studying the data and striving for that perfect bracket anyway.
How did your bracket do?