Roundliness is next to godliness

A thing I like about statistics is that it’s fundamentally about communication. There are enough degrees of freedom in data collection, modeling, presentation, etc. that two people’s analyses of the same data will almost certainly highlight different aspects and draw subtly (or wildly) different conclusions. Doing convincing work means being able to defend these decisions, and conclusions should generally be robust to reasonable alternative choices.

One decision I barely spent any time thinking about early on was “how many decimal places do I include in this number?” I don’t think I thought of it as a decision, really. Just copy the number from R, it knows what it’s doing.

Then I got made fun of by a coworker for writing a display that said that “programmers who use spaces make 9.143% more than those who use tabs” or something like that. “Just say 9%.” I knew she was right, but I couldn’t quite figure out how to explain why.

It comes down to what you’re attempting to communicate with with those trailing digits. In this example, assuming the tab-programmer makes $100k, that’s the difference between saying the space-programmer makes $109143 and saying they make $109k. Implicit in that is the claim that your predictions and measurements are so good that you can measure differences down to the dollar. In reality, the variation in programmers’ salaries is going to dwarf the extra precision of those trailing digits.

Andrew Gelman talks about this at length and far more convincingly over in Taking responsibility for your statistical conclusions: You must decide what variation to compare to:

But let’s look at this more carefully. What is so wrong with giving a confidence interval with fractional percentage points? Why, exactly, is this bad practice? After all, N is 2 million here, so we really can be very certain about the proportion from which these data are a random sample.

I’ll give you two reasons why these interval doesn’t make sense. First the easy answer, then the more thoughtful answer.

The easy answer is that the data are not a random sample.

But that’s not the whole story, as we can see by considering a thought experiment: Suppose the data had been a random sample. Suppose Github had a database of a kazillion pull requests, and the authors were given a ransom sample of size 2 million. Then the confidence intervals would be completely kosher from a statistical perspective, but I still wouldn’t like them.

Why? Cos “78.6%” is hyper-precise.

Let me give an analogy. Suppose you weigh yourself on a scale, it records 62 kg. But it happens to be a super-precise scale, of the sort that they use to measure diamonds, or drugs, or whatever. So it gives your weight as 62.3410 kg, with an uncertainty of 0.004 kg. Should you report the 95% confidence interval for your weight as [62.3402, 64.3418]? No, of course not. Two hours later, after you ate lunch, your weight is much different. You’re getting a hyper-precise estimate of what your weight happened to be, right then, that’s all. But for most purposes that’s not what you’re interested in.

(More discussion of his: Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?)

Maybe it’s “true” in some that space-programmers made exactly $9143 more than tab-programmers, but later we’ll run some other study at another point in time, and it would be silly to claim that that number is going to hold. But “they’ll make about 9% more” is much more likely to stand.

I’ll probably come back to this later, since I think you can actually draw a line between this and aspects of the replication crisis, and Scientism vs science. But for now I just want something to point people at when I tell them to use fewer decimal places.

So how many digits do I use?

As many as you believe? As many as you think would replicate? This is what I mean about the ties to doing good science. I don’t think there’s a quick and easy style-guide answer to this. It requires actually thinking about what you’re trying to convey with a given value.

One start would be to imagine if another study came out and said “actually space-programmers make 9.317% more than tab-programmers?” If your reaction is to say “oh, yea, well I didn’t expect those last decimals to stay fixed. That’s not a refutation,” then why were they in there to begin with?

But thinking is hard, and my brain is obviously not doing all that work as I read through a paper, and I was hoping I could figure out what my own heuristics are. Then I found this guy’s handy table, which looks mostly reasonable to me:

Which made me realize that my heuristic is basically “you get two digits.” Thinking preferred, but you’ll probably be able to trick me into believing that you did think if you just listen to him.