Of all of the concepts in Scrum, one of the most maligned (anecdotally it would seem, at least) is the Story Point. I have heard Scrum’s proponents and detractors alike discount the idea, saying Story Points are meaningless or too abstract, and that estimates and forecasts are better served by Ideal Days. For me, it’s a matter of probability, and the argument between Ideal Days and Points is one between precision and accuracy.
An example: imagine a man who weighs exactly 215.68 pounds. If you put that man on a calibrated digital scale, you will see that he, indeed, weighs 215.68 pounds, a number that is both accurate and precise. But, what if you didn’t have that scale and you instead had to look at the man and simply estimate his weight?
If you said he weighs between 150 and 160 pounds, you would be neither accurate nor precise. If you said he weighs 208.667 pounds you would be precise, but not accurate. But, if you instead said that he weighs between 210 and 220 pounds, you wouldn’t be precise, but you would be accurate. And if I have to choose between accuracy and precision, I’ll take accuracy every time. There’s a reason you hunt quail with a shotgun and not a 9mm.
I recently read Nate Silver’s superlative exploration of probabilistic forecasting, The Signal and the Noise, and this difference between precision and accuracy is at the core of his book. When we attempt to make precise estimates or forecasts, we are, essentially, expressing confidence in our ability to predict the future to whatever degree of specificity we provide.
But rarely can we confidently predict the future in the face of uncertainty – and few things are as fraught with uncertainty as game development. Instead, Silver argues, we should accept and account for the uncertainty in our attempts to predict events, and present our forecasts in a probabilistic manner. Our forecasts should strive for accuracy rather than precision, and account for a range of events that could occur. If you can narrow your probability margin to a point where that accuracy is actually meaningful, you have a functional and reasonable system for predictions.
In my mind, Story Points help create the sort of forecasting system that accounts for uncertainty. They may not utilize percentages in a probabilistic manner, but, by abstracting the size of a task to quantums of relative scope, they provide a spectrum of accuracy rather than a bullseye of precision.
When dealing with Story Points, you are trying to provide relative estimates of how big your features are. Feature A is about the same level of scope as Feature B. Feature D is about the size of Feature A and Feature C combined. Over the course of three calendar months, Team X burns through the equivalent of 100 Feature A’s. At my studio, we use the Fibonacci Sequence – for example, a 5 point task has the same scope as a 2 point task and a 3 point task combined. I’ve worked with other Scrum Masters that use t-shirt sizes. The exact format does not matter. What matters is that the scale gives you a sense of how big a task is, relative to others, and your burndown rate gives you a sense of how much of your backlog you can get through over any given period of time. It will never be precise, but, with enough experience and cumulative data, it will be accurate.
Using Ideal Days to estimate a task has three major flaws. First, you are using a well understood and fairly precise metric of scope: time. You are communicating a concrete concept of when a feature will be developed, and, even with all the caveats in the world and all the admonishments that it is just an estimate, your teams and stakeholders will have a hard time treating that number probabilistically or thinking of it as anything other than calendar time.
Second, what exactly is an “ideal day”? The best analogy I have heard is that Ideal Days are like the 20 minute halves in college basketball: if there were no time-outs, no penalties, and no whistle-blowing refs, each half would take exactly 20 minutes. But, much as you will never see a 20-minute half without interruptions*, you will never experience an “ideal day” in development. Build machines will crash, you will spill your burrito on your keyboard, some numb-nuts will send a lolcatz link to the whole office, etc, etc, etc. A system that strives for precision, while at the same time implying that it is not actually precise, is not a terribly useful system.
Third, Ideal Days get very messy when comparing them against actual man-hours. If you have three people working on Feature A, estimated to take 5 ideal days, what does that mean? Between the three of them, it will take 5 man-days? 5 calendar days? 5 days per person? And how do you break those days up if one of the the three developers has a higher priority task to implement Feature B, estimated at 1 Ideal Day? Can she tackle Feature B in parallel while her two team mates start on Feature A? In that case, do you have 6 Ideal Days total, or does the parallel production keep the total number of Ideal Days at 5? Take that inquisition one step further: say a team of 10 developers has to tackle 40 Ideal Days of features? What does that actually mean in development time? 4 days? A month?
I’m sure practitioners of Ideal Days would tell me that I’m over thinking this, but that goes to my larger point: if I’m an experienced developer and I have trouble with this concept, how can you expect it to be a meaningful method of forecasting for stakeholders?
Story Points are not precise, but, if you know you can’t be precise anyway, why bother with precision? Focus on accuracy and account for your uncertainty. Establish a Story Point scale by which you can reasonably gauge relative scope and burndown, and use that less precise, but more accurate metric to forecast progress.*In keeping with the spirit of the post, I should actually treat this statement probabilistically, and say you are 99.99% certain to see interruptions in any 20-minute half of college basketball you watch. I will make no such qualifying statement about development. You will NEVER have an ideal day.