The Texas Rangers are one of five MLB teams that hasn’t won a World Series. If they win it all this year–and they are currently leading a surprisingly competitive AL West at the season’s midway point with one of the hottest offenses in the majors–the Rangers may have benefited from an unlikely contributor: generative AI.
Like most MLB teams these days, the Rangers employ a team of data engineers, data analysts, and data scientists to pore through reams of data in hopes of gaining an advantage on the field. Baseball has always been a numbers game, but the practice has been on overdrive since the Oakland A’s launched the “Moneyball” era in the early 2000s.
But the Texas Rangers may be moving ahead of other MLB teams in the sophistication of their big data operations. Alexander Booth, who is the assistant director of research and development within baseball operations for the Texas Rangers, discussed the team’s investment in data, analytics, and AI during the Databricks Data + AI Summit last week.
According to Booth, the Rangers’ big data operation has undergone a transformation over the past couple of years. That period has coincided with an explosion of new data sources, such as biomechanical player-tracking data and meteorological data, as well as a move into Databricks’ lakehouse running in the cloud.
“Even though we work in baseball, we have the same data problems as everybody else,” Booth said in an interview with the Cube last week. “So we started with an on-prem system and then that on-prem system couldn’t scale. We moved to the cloud. We had some issues with our traditional data warehousing–data replication issues, governance issues.”
Massive streams of unstructured data–including from markerless motion capture systems that track the movements of players’ joints and real-time wind, temperature, and humidity data collected from ballparks every five seconds–were proving particularly troublesome for the Rangers’ data team, Booth said.
After completing a proof of concept with Databricks, the team made the decision to standardize on the San Francisco company’s lakehouse. The move so far has worked out well for the Rangers, Booth said.
“These large, unstructured data sources coming off the biomechanics data, off the wind data, has to be transformed into an actionable insight,” Booth told the Cube. “We need to make that data available so our players and coaches are able to get meaning from it. And so we found that our other solutions weren’t working with this new data. Databricks came in…we did our POC with them. We worked on how we could scale up these processing models using Databricks for that transformation layer. That was just a natural fit.”
The Rangers are in the midst of a big data, analytics, and AI transition in the same way that companies in other industries are. Retailers, banks, and healthcare companies are shifting from focusing exclusively on the production of backwards-looking dashboards based on periodic batches of data to using real-time data to make decisions in real-time, and so are the Rangers.
The Rangers and other MLB teams are getting much more sophisticated with their motion tracking systems. The most sophisticated systems require a full-blown lab, like the one the San Diego Padres announced last week that they will help build at Point Loma Nazarene University. These systems help pitchers and batters fine-tune their mechanics.
But there’s plenty of other motion data is available from the MLB’s investment in the Statcast player and ball-tracking system, which utilizes a dozen cameras installed at each ballpark around the country. The Statcast system got a big upgrade for 2023 that will see more widespread use of the Hawk-Eye high frame rate cameras, providing more detail on things like bat speed, attack angle, and contact point (see video below).
The Rangers are doing their best to ingest data wherever they can find it. “It’s an explosion for sure. Even five years ago, none of this data really existed, so a lot of teams are playing catch up,” Booth said during a press conference at the Databricks show. “It’s a competitive advantage to be able to consume this data and make data-driven decisions faster, on this big data at scale, and to build predictive models that other clubs might not have, to find hidden value.”
One of the biggest sources of hidden value may be the Rangers adoption of generative AI. Made world-famous by the launch of ChatGPT in November, generative AI technology has captured the imaginations of millions of people in all walks of life, including professional sports.
According to Booth, the Rangers are making use of generative AI to help process and analyze unstructured data, primarily relating to player development. As such, it’s more of a batch use case than a real-time one, but if it helps improve the Ranger’s world-class farm system, they’ll take it.
“There’s a lot of text data in baseball,” Booth said during the press conference. “There’s a lot of player evaluations. There’s a lot of written articles about teams and players. And there’s a lot to sift through. There’s thousands of players in baseball and trying to understand how the sentiment tracks from one player to the next is difficult at scale.”
The Rangers use generative AI to automatically summarize large amounts of text data into a package that the scouts and player development personnel can more easily consume. According to Booth, the technology allows the club to ingest dozens of articles or scouting reports and create a one-page summary.
“It allows us to analyze our players faster and being able to understand how the emotion and sentiment of how scouts are talking about players, beat reporters are talking about players, and get that quick, to-the-point summary, and of course that helps makes decisions faster,” he said during the press conference.
With the trade deadline coming up and the Rangers sitting three games ahead of their arch-rival Houston Astros, it’s pretty clear that the club will be buyers and not sellers this year. After a 94-loss campaign last year, the fact that the Rangers are in the top five MLB rankings in several offensive categories, including batting average, hits per game, and runs per game, shows that something is clicking this year in Arlington, Texas.
There’s no way to know whether the Rangers’ remarkable turnaround is due to the club’s embrace of data, analytics, and AI; whether it’s the hiring Bruce Bochy as its new head coach this year; or whether the offensive outburst and stacking up of Ws is just a fluke attributable to the whims of the baseball gods.
But one thing is clear: the Rangers won’t be letting up on the data angle any time soon.
“We use AI and data all the way down through the organization,” Booth said. “We use it at the major league level for game strategy and lineup decisions. We use it in the minor leagues for player development, pitch-design sessions. We use it in the amateur draft coming up next month. We use it at trade deadline, trying to find the optimal players that are going to give us the best chance of winning the World Series.”