Discuss as:

The science of baseball stats

 NIST / Notre Dame A wind-tunnel test shows  the turbulent air flow  around a baseball.

Was there ever a team sport better-suited for statistical modeling than baseball? The heart of the game involves one pitcher vs. one batter at a time, allowing for a dizzying array of individual statistics. The regular season, as well as the typical player's career, will generally last long enough to build up an encyclopedia's worth of those statistics.

No wonder so many statisticians and physicists love to theorize about the game's winning factors - and no wonder new statistics are being created on a regular basis.

Batting averages and earned-run averages were just the start: Nowadays, you can track win shares and win probability, defense-independent ERA and range factor. But there's always a farther frontier for baseball analysis, and a couple of new twists came to light at last month's annual meeting of the American Association for the Advancement of Science.

As Major League Baseball kicks off the new season this week, here are a few Web links you just might get a kick out of - even if you're not a fan of the game:

SAFE or out?
The University of Pennsylvania's Shane Jensen caused a stir with his proposed method for judging fielding performance, involving a new statistic called spatial aggregate fielding evaluation, or SAFE.

"Things like hitting or pitching are a little bit easier to quantify ... because they're easy to tabulate. There's a finite number of outcomes," Jensen said. "Fileding is a much more challenging endeavor because you're trying to estimate people ranging toward a ball in play on a continuous surface."

SAFE uses mathematical modeling to determine the "overall measure of fielding quality" for each player in the 2002-2005 period - that is, how many runs each fielder saved or cost his team over the course of a season. The stir came about when Jensen noted that Yankees star shortstop Derek Jeter ended up rated as one of the worst fielders in the majors.

That sparked some choice headlines in the home of the Red Sox, the Yankees' archrivals: The Boston Herald headlined its blog item "Science Proves Derek Jeter Does Indeed STINK." Of course, Jeter's spot is still safe - if not because of his fielding, then because of his hitting and his history. Nevertheless,  Baseball Musings' David Pinto explained that statistics like SAFE could make a difference when it comes time to evaluate trades and negotiate contracts.

"You're spending money, where can you get the runs?" Pinto said. "If we do fielding better than we've done in the past, here's a way of saying, 'Oh, I can have an edge over some other team by knowing that this person can save me 10 runs.' And 10 runs is usually a win."

The model manager
OK then, how do you evaluate the contribution of the team manager? Swarthmore College's Steve Wang delved into the machinations of managers - how long they left their starting pitchers in the game, for example, or how many different lineups they used in the course of a season.

"Certain styles might be more effective with certain kinds of teams," he explained in a news release about his research. "A manager who prefers to stay with his starters might be best suited for a team with veteran starting pitching, whereas a team with fragile young arms might do best with a manager who uses his bullpen aggressively."

Wang grouped managers together into clusters, based on the similarities he saw in the statistics. Last year's division-leading managers in the American League - the Red Sox's Terry Francona, the Angels' Mike Scioscia and the Indians' Eric Wedge - clustered together as moderate managers in the pitching-related categories. They didn't get too hot about their pitchers, nor did they play things too cool. But in a follow-up e-mail, Wang cautioned that you can't read too much into that.

"I would be hesitant to put too much weight on that conclusion," he told me, "since I was not systematically looking for such correlations, and it's also not clear which way the cause-and-effect runs (i.e., whether being moderate causes success, or whether being successful enables the manager to be moderate, or yet some other relationship)."

Statistics vs. steroids?
A similar caveat would apply when considering whether statistics could be used to sniff out steroids. Jensen said it would be "incredibly difficult to infer any causation from a statistical analysis."

In an analysis written for The New York Times, Jensen and three of his colleagues at Penn take a look at the Roger Clemens steroid case and confirm that there was something definitely unusual about the pitcher's late-career surge. However, it would be impossible to attribute the surge definitively to steroid use, they said.

Part of the problem is that the information about steroid use in the majors is so murky. Perhaps if the players who used steroids could detail exactly when they used them - say, as part of an amnesty program - statisticians could check for correlations in the performance data.

"I think we can look for people who we might want to test more," Pinto said. "I think if we now see someone in their 30s having a huge career surge, that should raise a red flag. It can happen, but if it happens over two or three years, I might want to test him every month rather than twice a year."