Health knowledge made personal
Join this community!
› Share page:
Go
Search posts:

Coming to Terms with Performance Data: My Hero, Duke Snider, Was Not As Good As Mickey Mantle and Willie Mays

Posted Oct 18 2011 6:45am

Duke Snider died at the end of February. He was my first sports hero. I grew up in a family of Brooklyn Dodger fans, and the Duke was their star center fielder from 1947 through 1957. That was the Dodgers’ last year in Brooklyn; their owner, Walter O’Malley, having failed to extort a big new ballpark from the borough (some things never change), uprooted the franchise and planted it in Los Angeles. Snider’s career waned after the move – he turned 32 in his first season in LA, was hobbled by a bad knee, and became a part-time player. My memories of him go back only to 1959, when he had his last good year (23 home runs, 88 runs batted in, .308 batting average in only 370 at bats, about two-thirds of a full-time-equivalent season).

Professional and college football and basketball dominate the contemporary American sports scene, but in the 1950s, baseball was king, described without irony as the national pastime. The epicenter of major league baseball in the 1950s was New York, home to three excellent teams: the Yankees of the American League, and the Dodgers and Giants of the National League. From 1949 through 1957 at least one of the three was in the World Series every year, and in 7 of those years there were two. (The streak extended through 1966 after the exodus.) The (Damn) Yankees won most often – 5 straight titles from 1949 through 1953, and another in 1956. They beat the Dodgers in 1947, 1949, 1952, 1953, and 1956, and the Giants in 1951. The Dodgers beat the Yankees for their lone title in 1955, while the Giants won the previous year over Cleveland.

By a remarkable coincidence, all three teams had a great center fielder. Willie Mays, Mickey Mantle, Duke Snider: all Hall of Famers, home run hitters who could run and field. None was homegrown. Mantle was from rural Oklahoma. Mays was from Alabama, lucky enough to be born in 1931 (as was Mantle) and just 16 when Jackie Robinson broke the major league colour barrier as a Dodger. Snider was from California, home of year-round baseball weather and Mecca for Steinbeck’s desperate dust bowl Okies who left families like the Mantles behind to rot their lungs out in the mines. The Duke was five years older than the other two and joined the Dodgers with Robinson in 1947. Mantle came up in 1951 as a teenager, while Mays had barely turned 20 when promoted that same year. Mays missed all of 1952 and 1953 serving in the military, but from 1954 through 1957 all three were in their prime, captivating the metropolis with their exploits, each claiming the devotion of millions of fans. Who was best: the Mick, Willie, or the Duke?

I had no doubt of course; I was a wide-eyed kid giddy with the first bonding to a team and hero. The Duke was best. Didn’t he hit 40 home runs 5 years in a row – something neither Mantle nor Mays accomplished? Wasn’t he the best player on the team that won six pennants in a decade? Didn’t Yankee manager Casey Stengel exclaim that Snider made the finest catch he’d ever seen?

My belief was unshakable, mirroring the devotion of fans of Mantle and Mays. And for those of you wondering what this has to do with health care, here is where our tale really begins.

There are people who love baseball and people who hate baseball and people who don’t care about baseball, but one thing is undeniable: it is the most data-rich of all sports, and has been for over a century. Visionaries created and refined baseball databases from the 1870s onward. But there was very little creative use of the data, and the descriptive statistics hardened into conventional wisdom, the outlines of which are familiar to even casual fans. On offence, batting average, home runs, runs batted in, and stolen bases defined a player’s excellence; on defence, fielding average (percentage of error-free balls handled); and for pitchers, wins and losses, strikeouts, and earned run average. Want to know how good a player is? Look at these numbers.
There were occasional attempts to mine baseball data in the early decades of the 20th century, and some debates about which statistics were the most reliable predictors of performance. Analysis of these flat-file, paper records was hugely labour intensive, almost Dickensian, done mainly by hobbyists. Eventually some pioneers dug a little deeper and began to see things a little differently. One of these was Branch Rickey, the studious Dodger general manager who shepherded Jackie Robinson into the big leagues. A number of amateur baseball historians and statisticians assembled databases and compiled encyclopaedias of player records (the first was published in 1951). Occasionally someone would publish a quirky analysis in an obscure journal or write a largely unread book.

And then along came Bill James, who changed baseball forever (though like Newton, if he saw farther than others it was because he stood on the shoulders of giants, equipped with a tool they never had – the mainframe and later the microcomputer). He was the first baseball stathead rock star. In 1977 he began self-publishing mimeographed little books called the Bill James Baseball Abstract, full of statistical analyses and observations about player performance. Gradually the books developed a bit of a cult following, and his big break came in 1982 when a media conglomerate offered to publish them. Thus was born the popular science of baseball statistical analysis, which in essence can be reduced to a simple quest: explaining performance by relating inputs (the players) to outputs (what happens during each of the hundreds of discrete events that define a game) to outcomes (which team wins and how often).

Why James, and why then? James is smart and analytical, but he is also a master of what health care folks call knowledge translation. He is a very good writer with very good judgment about how to spend his time, proportionate to the importance of the question. He is an iconoclast unafraid to challenge settled wisdom, and he is prepared to revise his own judgments based on deeper examination of the data (hence his flip-flop about whether Lefty Grove or Walter Johnson was the greatest pitcher of all time). He is part historian, part numbers geek, and part story teller. He knows how to program simulations and he is careful not to overinterpret his findings. He recognizes that statistics cannot explain everything, that character and fortune and context matter. I have read a lot of (probably too many) baseball books, including compilations of arcane statistical papers by researchers more technically advanced than James. But none offers the pleasure I get from reading a James essay. He is the Atul Gawande1 of baseball writing (actually, Atul Gawande is the Bill James of medical writing), whose prose matches his analytic talents, and who has a journalist’s nose for the good story.

I cannot say why the statistical revolution that swept baseball took off in the 1980s, but there is one clearly important factor: money. Baseball salaries began to rise sharply in the 1970s due to the players’ union victories in arbitration hearings and the courts that ended lifetime indenture to a single club. In 1975 the average salary was $45,000; in 1980, $144,000; in 1985, $372,000; in 2010, $3,340,0002. The freedom (not absolute, but considerable) to market one’s talents to the highest bidder shifted bargaining power from owners to players. Adjusted for inflation, in 2010 dollars Snider’s peak salary was $314,000; Mantle’s, $707,000; and Mays’s $855,0003. Today an average center fielder with ten years of experience would earn a minimum of $6 to 8 million; a star with our trio’s credentials could get as much as $25 to $30 million4.
Buying and paying players became high-stakes games. With more teams (the number of franchises rose from 16 in 1961 to 26 in 1977; there are now 30) and freer player movement, more variables came into play. How long should you sign a good player, and when do you start to discount for age? Do you lock into long term contracts with young but unproven players or try to buy championships by signing expensive, experienced talent on the open market? Do the best players come out of college or do they begin their journeys through the minor leagues right after high school? Is the number of innings pitched correlated with arm injuries? Do you go after speed or power?

These weren’t new questions but the price of bad judgment had become dramatically higher. There were a lot of frenzied, ill-considered decisions in the early days; in fact there still are5. A pitcher with one great year might score a huge, multi-year contract and never excel again. It was a new game and everyone had different theories about what counted in performance and how to predict future success. When players were bound for life to one team, the focus was on scouting sandlots and the low minor leagues to assess the talent and sign the Next Big One. Major league salaries were mainly based on retrospective, time-lagged data – you got paid this year for what you did last year. And what you did last year was thought to be revealed by the standard numbers – res ipse locitur. The traditional statistics defined performance; there was no more to it.

Except there was. Over the past quarter century or so, baseball statistical research and analysis has transformed the game. There are arguments about how much it has mattered and whether the database and the computer program are superior to real time, live observation and human judgment and what is often labelled “the intangibles.” (Cue discussion on the “art of medicine”.) Not even the nerdiest stathead believes that statistical analysis explains all outcomes or predicts future performance with a high degree of certainty. Baseball history is littered with sudden and inexplicable spikes and plunges in individual performance6. An arm goes lame (my second hero, Dodger pitcher Don Drysdale, at age 32); a player is coaxed into playing too soon after an injury (Dizzy Dean)7; a player falls prey to extracurricular vices (Doc Gooden, Daryl Strawberry); pitchers find an irreparable hole in a promising player’s swing (Joe Charbonneau); suddenly a fine pitcher can’t throw a strike (Steve Blass); a player who performs modestly for 6 years becomes a monster hitter in his 7th and even better in his 8th (Jose Bautista). Baseball is not checkers, and players are not robots. It is a game of mind and the body, unified by failure. The best hitters fail to reach base 6 times in 10. The best pitchers yield two to three runs a game. Great teams lose 35% of the time.

Which brings us back to Duke Snider. Before sabermetrics – so named for the Society for American Baseball Research, which really does exist and thrive – you’d just look on the back of the baseball cards, scan the records, and argue for your hero. You could point to unadjusted statistics and infer false equivalencies. And you could explain away the big-dot statistics if you felt like it – “yeah, maybe he only hit .270, but he was great when the game was on the line8.” The less comprehensive and nuanced the data, the easier it was to make a pretty good case for Snider being right up there with the Mick and Willie. But when you crunch the numbers, adjust for the ballpark characteristics (as in case-mix adjustment), look harder at the defensive data (not just proportion of balls caught, but how many gotten to, as in population-based outcomes), and follow the arc of a career (not just the best years, but the slope of the early ascent and the late decline), the evidence becomes inescapable. Duke Snider was not as good as Mantle and Mays, and the performance gap is substantial. In his very best years, he accounted for 2 to 4 fewer additional wins for the Dodgers than Mays contributed to the Giants or Mantle to the Yankees in their best years9. That’s huge, because a fine player (like Snider) having a terrific year will add about 4 extra wins to his team’s total, and an otherworldly player (like Mantle or Mays) will add 7. And that is all that matters.

This doesn’t make Duke Snider any less of a hero to me10, but truth stared me in the face, and I had a choice: deny it, or face it. Whether a fan like me changes his mind because of the evidence really doesn’t matter. But it surely would matter if I were a general manager of a baseball team. There were constant rumours of the New York teams swapping center fielders. If the Yankees or Giants had traded theirs for Snider, it would have been a monumental mistake (especially for Mays, who was not only better, but lasted longer, a high-performing player until 1970; Mantle was done in by bad knees and excesses worthy of Charlie Sheen). High-performing teams pay attention to all the evidence that can get.

The purpose of studying baseball data is to understand what creates success. The purpose of studying health care data is to understand what creates success. In baseball, success is very simply defined: did the team win the game, and in a 162 game season, did it win more games than the other teams? In health care, success is not quite so one-dimensional, but the concept is in principle the same: did the intervention add a greater benefit (quality of life, length of life, comfort) than an alternative? Did it deliver the benefit at an equal or lesser cost than a similar benefit produced by other means?

There are countless examples of how the use and refinement of data has changed both how baseball is understood and how it is played. Many are, mutatis mutandis, applicable to health care. Among them It used to be thought that successfully stealing a base increased the expected number of runs by a certain amount and being caught stealing decreased expected runs by the same amount. Not so - it takes over two successful steals to offset the damage caused by being thrown out once. Think of screening program results: you have to subtract the harm caused by false positives and false negatives from the benefit resulting from true positives to estimate overall value.

For pitchers, wins and losses used to define performance. But a pitcher’s won-loss record is subject to factors beyond his control. In 1990, sportswriters voted Bob Welch the American League’s best pitcher because he won 27 games and lost 6. His earned run average – the number of earned runs11 he gave up per 9 innings pitched – was 2.91. That same year, Roger Clemens won 21, lost 6, and had an earned run average of 1.93 – a full run better per game, a gargantuan margin. At the time few of the baseball writers who voted for the award winner were familiar with the new analytics and were consequently stuck on the wrong indicators of performance. Twenty years later, Seattle’s Felix Hernandez, (13-12, 2.27) won the same award over C.C. Sabathia (21-8, 3.18) and David Price (19-6, 2.73). Hernandez was fully deserving: he pitched fabulously on a lousy team, he pitched more innings, and he struck out more batters than his competitors. It wasn’t Hernandez’s fault that his teammates couldn’t hit, depriving him of wins and saddling him with losses despite his own terrific work. That he won is an indicator of the widespread adoption of Jamesian metrics by most of the baseball writers of America. Similarly, the primary care physician who labours mightily and well to manage frail elderly patients may be performing at a higher level than others who achieve better outcomes under less difficult circumstances.

Environment matters. The Colorado Rockies play in Denver, the mile-high city. Before changes were made to the park dimensions, Coors Field was a hitters’ paradise; balls carry farther at altitude. Ordinary players would hit 40 home runs a year because they played half their games there. Baseball analysts now routinely factor in park characteristics to calculate equivalent performances. A San Diego Padre who batted .270 with 25 home runs and 85 runs batted could be the equal of a player with .300 – 37 – 115 in Denver. This is analogous to adjusting for neighbourhood and/or socio-economic status characteristics when assessing the performance of health care organizations or individuals.

I could go on and on about how baseball statistics have evolved, about the data that foretold that Wally Bunker’s 19-5 record as a 19-year old rookie pitcher in 1964 would be his career best. (He didn’t strike enough guys out. The more balls in play, the greater the chance the batter will reach base and score.) But these are stories for another day, to be crafted for the purpose of applying sports wisdom and insight to health care. Do the data tell a different story about a hospital’s excellence than its reputation? Do academic credentials predict performance? Does cardiac surgery contribute as much to quality of life as exercise? Do MCAT results correlate with clinical outcomes? Deciding on how much to spend on diagnostic imaging is no different from deciding on how much to pay your 8th-inning set-up reliever – it’s a question of value for money, with value defined as results.

Learning from the evidence and validating the meaningful indicators of high performance baseball players didn’t shake my admiration for Duke Snider. I still thought he was great, he was a Dodger, and just because I could no longer declare him the best was no reason to give him up as a hero. Maybe, had I remained ignorant of baseball science, I might have persisted in my misconception of his worth and derived some psychic satisfaction from my continued delusion. That’s too high a price to pay – even for sports fans, whose loyalties and faith-based misconceptions ultimately don’t matter all that much. Not to overstate it, but learning to make informed judgments about sports is not bad training for making informed judgments about more important things, like whether you would want to be treated in a hospital where half the staff don’t wash their hands properly. You might love your local surgeons but before you let them do Whipple procedures, look at the data.

The assembly of high quality, validated databases; public reporting on performance; and sophisticated data analysis have upset the old order of things in baseball and other sports and settled many a debate about the absolute and relative value of players. These advances have knocked the reputations of some players down a peg and elevated the reputations of others whose subtler talents would have consigned them to obscurity. But the greatest contribution has been to improve the quality of the game by substituting evidence for superstitions, democratizing opportunities and changing decisions at all levels in service of higher performance.

Health care should aspire to no less.

Topics: 
Post a comment
Write a comment: