Calculate the probability of election result from 1% of votes counted

mrpace

New member
Joined
Oct 19, 2020
Messages
2
Hi there
I was watching the NZ election live the other night and wondered how confident I could be of knowing the final result given that only 1% of the votes had been counted.
Here is a simplified example
Suppose there are two parties, Party A and Party B
For a party to win the election, they must get more than 50% of votes
1,200,000 votes have been made but only 12,000 have been counted (1%)
Of those counted, Party A has received 7800 votes (65%) and Party B has received 4200 votes (35%)
What is the probability that Party A will win the election?


I appreciate any help or suggestions as to which theorem/s could help me
Thanks
 
This is actually a more difficult problem than it seems. Normally when calculating probability, you work with a known population, and ask for a probability of a given sample from that population. Your question works in reverse.

I've actually been working for the past year with a question very similar to this, which is to try to predict election outcomes based on polling. It's really not that different from your question (think of it as a 12,000 sample size poll). There are a couple things I would point out that are important to keep in mind:
1. Assume that there are no known irregularities in how the sample is taken. In many elections, districts may report totals separately, and so it may turn out that the 7800 votes Party A received are known to come entirely from districts that are known to be heavily populated by supporters of Party A. If you know that this is the case, then you no longer have a random sample. And I would say that this is probably more likely than not the case. But for sake of argument, assume that they're just processing random ballots from across the general population.
2. As long as sample size is at least what a typical political poll is, and the total population is much larger than that, then the total population won't change the results much. The total population could be 12 million or 12 billion with the same sample size. It won't make a significant difference. It's counterintuitive because you think there is more room for the result to move away from the measured ratio in the sample, but actually there is more room for it to even out the clustering.
3. As the sample percent moves away from 50%, the probability of winning generally moves away faster. Having 65% in the sample generally correlates with extremely high probability of winning (like .99+). In fact I don't believe you can find any election where a reliable poll of 65% did not correlate with a win.

I don't think there is an official formal process for doing this. I have experimented with a couple unconventional methods.

The method I currently use "misuses" the process for testing the null hypothesis. I use that process to calculate a z-score and p-value. The p-value is what I use as the probability. Why do I use a process that I know wasn't designed for this? Because it works. To do it this way (I do this in MS Excel):
z-score = ((7800/(7800+4200))-0.5)/SQRT(0.25/(7800+4200)) = 32.86335345
p-value = NORMDIST(32.86335345,0,1,TRUE) = ~1, which as I said above, is what you'd expect with those numbers.
Note that in the z-score, "0.5" and "0.25" never change, only the other values (which should be self-evident), and in the p-value, the last 3 parameters in the function are always at those values).

What if the numbers are much closer though? Let's say Party A has 6100 (50.8%) and Party B has 5900 (49.2%). What is the probability that A wins? Using my method:
z-score = ~1.826
p-value = ~0.97

By the way, the reason it's still so high is because our sample size (12,000) is HUGE. Party A is up by 200 and it's extraordinarily unlikely that any natural random clustering will eliminate that deficit. However, if we make the sample size more like a typical poll and go with Party A at 305, and Party B at 295, it's a lot less certain:
z-score = ~0.408
p-value = ~0.658, which is considered near toss-up territory in elections.

Long story short, unless the 12000 sampling is non-random (which again, is actually likely due to districts reporting at different times), Party A is going to win that election, and if they don't it would be the greatest upset in election history.

I have a second method of getting a probability here, but it's really unconventional (even more so than the above process). It involves some really weird manipulation of margins of error. I can go into it if you want.
 
Top