The Null Hypothesis & Significant Difference

pomrania:

drferox:

Science Literacy Lesson #4

Let’s say you have two dice, a pair of d20s. 

The Null Hypothesis says there is no (zero) correlation between the numbers rolled by each dice, that they are both completely independent of each other.

You roll them together and they both come up with 20! Super lucky!

Or is it? How many times would you need to roll those dice to prove that they are linked to each other somehow?

The Null Hypothesis is the default assumption in science that two variables (things) just don’t affect each other. It’s the science equivalent of ‘innocent until proven guilty’. You need to prove a link between two things, you don’t assume it’s there.

So in this case we are assuming that the number rolled on the first dice will have no effect whatsoever on the number rolled on the second dice.

Ah! But both dice rolled a 20. So what does that mean?

It means your sample size was small.

You can’t reasonably make a conclusion about these dice based on a single roll. You can’t make a reasonable conclusion based on 5 rolls. The number of times you roll the dice is your sample size (often just called N when we’re discussing the experiment. So five subjects is N=5). If your sample size is unreasonably small, then it weakens your experiment and makes it harder to justify conclusions.

The sample size you require to prove your point varies depending on what you’re testing, but generally larger numbers are always better. An experiment with a sample size of 10,000 is significantly more meaningful than an experiment with a sample size of 12. This is quite simply a numbers game.

(That said, context is also important. If you were publishing a study about the Norther White Rhino, for example, and your sample size was only 2 that may still be significant as you do your best with what you’ve got.)

Now you know, as a reasonable human being or corvid-in-a-trenchcoat, that rolling the dice once doesn’t prove they’re linked somehow, because you know how dice work. A single incident is not any kind of useful evidence. Yet people use single incidents as anecdotes as ‘proof’ all the time! Can you see how frustrating that is?

You rolled the dice once and thought maybe the two dice were linked. You roll the dice 100 times and you will probably see those dice are not linked. The strength of your evidence will only grow as your sample size does.

But what if, hypothetically, you rolled this pair of dice 5 times, and each time you did so they came up with the same number?

Statistics in scientific experiments get complicated and you will completely fall asleep if I try to explain it, but experiments should be looking for a significant difference which is an indication of how likely or unlikely you would be to get the same result if the Null Hypothesis is true.

In this case, rolling your two d20 and them coming up with the same number each time for all 5 times in a row is something like 64 million to 1, if it’s just random chance. That’s a pretty significant difference.

But if you rolled the dice a thousand times and only 5 of those times came up with the same number? Not so significant. Another example of why you need to include your negative results.

Most results report their significant difference to a confidence value of 95 or 99. That means they’re either 95% or 99% sure that their results could not have been achieved by random chance alone. You can’t get to 100 of course, random chance is like that, but the closer you get, the stronger the evidence to reject the Null Hypothesis.

And, look, I’m more of a visual person, so have a graph. If your data produces a confidence value of 95, it means that there is a 95% chance the data didn’t just come from random luck. That the data wouldn’t fall under the blue bit of this curve if the Null Hypothesis is true.

And Confession: I don’t find these numbers fun. If I was running an experiment I’d totally pay a lovely and clever statistician to tell me what sort of sample sizes I would need to achieve my desired confidence interval. I’m not going to run down how you calculate them, just what these numbers are and why they matter.

The larger your sample size, the easier to prove a significant difference with a sigh confidence value. The weaker any of these factors are, the weaker the experiment in general.

Did that give you a headache? It gave me a headache. But hard science often involves some serious maths.

Anecdotes give you an idea in the first place; statistics is how you see if that idea is likely to be accurate or likely to just be coincidence.