Flat Earth Catalogue

2019-12-09

The Big Idea and the Magic of Statistics
Can you really find out what millions of people think by polling just 500 or so?
Sampling 500 people will for sure not give you a perfectly accurate picture. The big idea that makes statistics a science is that the inaccuracy is manageable. You can measure your measurements and see how reliable they are, because their inaccuracy follows a well-defined pattern. (Jargon: sampling error, bell curve.)

(And anyway, asking all of the millions of people your question isn't going to be perfectly accurate either. By the time you've found and asked the last one, some of the first ones will have changed their minds. So if you want to know the situation at a particular moment in time, you're better off taking a sample.)

Suppose you take a bunch of polls, all the same size, about something. They won't all give just the same answer. By chance, some will give bigger results, and some smaller. But if you see how wide the range of results is, then you know how accurate that kind of poll is. We'll simulate that now.

You have a truck full of green and brown M&Ms, thoroughly mixed. You blindly pull out 10 of them and you get 6 brown. Does that mean 60% of the whole truckload is brown M&Ms? Well, maybe. Maybe not. How can you find out if it's a good estimate?
 
Try repeating it, see if it comes up consistent. So you drop the 10 back in, stir the container, pull another 10 and you get 6 brown again. Do it again, and you get 7 instead of 6. Then 6 again. Then 7, 9, 6, 8 ... You do it 20 times, and you always get 6-9 brown M&Ms. 

Hey, 9 is the most you've ever seen at one time. Does that mean there are only 9 brown M&Ms in the whole truck? I mean, it's possible, but it's not real likely. More likely 60-90% of the load is brown.

60-90% is still a very wide range. Can we narrow it down? Yes, we can look at your 20 trials in more detail. 4 times you got 60%. 7 times you got 70%. 5 times you got 80%. 4 times you got 90%. So most of your tests reported 70-80%, and that's the most likely range. 

OK, that's still real imprecise. So you decide samples of just 10 are too small. What if you take 20 M&Ms? You try it and only pull out 10 brown. So now it's 50%?! Maybe samples of 20 aren't great either. You try repeating it, and you get 11, then 15, 13, 13 again, 12 ... Do it 20 times, you get results as low as 10 (50%) and as high as 16 (80%), and most of your tests are in the range 12-14 (60-70%). Well that didn't help much.

Try samples of 50. In 20 tests, minimum could be 25 (50%) or max 38 (76%), but most of your tests get 32-36 (64-72%). That is more precise. 

Try samples of 100. A couple times you get only 65% brown M&Ms, and once you get 75%, but most tests are in the range 67-71%. The range is tightening up.

How about samples of 200? On one trial you get as few as 128 brown M&Ms (64%). In a freak occurrence, one time you get as many as 156 (78%). But the second biggest result is only 148, or 74%, and most of your tests get 138-146 (69-73%).

Samples of 500. Once you get only 328 brown M&Ms (65.6%), and once you get 373 (74.6%), but most of your tests get 338-349 (67.6-69.8%).

Samples of 1000. Your lowest result is 67.9%, and your highest is 72.7%, but most of your tests get 69.1% to 71.3%.

Samples of 2000. Lowest result is 68.3%, highest 72.5%, most are 69.1% to 70.5%. 

You can keep taking larger and larger samples if you want, but there's no point. I put 327 million M&Ms in the truck, and 70% brown, and you were pretty close to that the very first time you took a sample of 200, even though that sample was only 0.00006% of the whole load.

Looking at the ranges for different sample sizes, you can see that your accuracy really tightens up between a sample of 100 and a sample of 500, and after that you don't get much return on your effort. Professionals know that a sample of 200 is good enough for a thing like this, if it's really random. 

If it's not really random, you might want larger samples. If I didn't stir the truck quite thoroughly enough, so that there were pockets of 15 M&Ms all the same color, on a sample of 100 that could really damage your accuracy. But on a sample of 500 it's no big deal. And again, professionals know how big the pockets of non-randomness usually are, and how they affect the accuracy.

Now here's the funny thing: The size of the truck doesn't matter. We could do this whole process with only 4000 M&Ms in the truck or with 7 billion. It would still go about the same, as long as the true fraction was near 70%. Every additional M&M in your sample gives you the same amount of information, no matter how big a truck it comes from. As your sample gets bigger, more and more of that information is stuff you already knew, until it's not worth your while to keep going. 

The accuracy depends on the absolute size of the sample, and on the frequency of the thing you are trying to measure. Frequencies further from 50% are harder to measure, because you get less information from each individual in the sample. But as long as the sample is random (or as long as you expand the sample to make up for non-randomness), the accuracy does not depend on the fraction of the population it covers. And the same is true for polls of people.
 
(How much information each individual M&M provides depends on how hard it is to guess the color before you look at it. If the fraction is extremely high or low, you can guess right almost every time, which means you don't learn a whole lot from actually looking. You get the most information per individual when the frequency is 50%, because that's when your power to guess is lowest. So that's why accuracy depends on frequency. But if you have a rough idea of the frequency, you can easily design a sample that will be more than big enough.)

19:08

Powered by Blogger

 

(K) 2002-present. All rights reversed, except as noted.

Hard-won technical knowledge, old rants, and broken links from 10 years ago. I should not have to explain this in the 21st century, but no, I do not actually believe the world is flat.

Past
current