I have something which has been bugging me a bit which I will try to explain in a VERY simple example...

Say I am going to roll a 6-sided dice. I ask a number of people to predict the number that will come out. They do this without knowing each others predictions. So, if there were three people and they chose separate numbers, there would be a 50% chance of one of them getting it right. However, if two of the three happen to pick the same number, only two of the six outcomes are 'covered' making it a 33% chance. If they all picked the same number (obviously less likely) then there would only be a one in six chance (16.67%)

So, the more predictions there are, the more likely it is that there will be a 'winner' however the more likely there will be repeat predictions.

**Imagine this example scaled up to say 10,000 outcomes and 2,000 predictions - how likely is it there will be a winner?**

It is

__not__20% (or very, very unlikely to be) because of the repeat predictions. So, what if there were 3,000 predictions, 4,000 predictions etc.

Even at 10,000 predictions what would it be? I expect it would be close to 90-100% but what would it be? If there were 20,000 predictions (or 100,000) how close would you get to 100%? I guess there may be confidence levels here.

**Is there an online tool or model where you can put the variables in and it tells you the likelihood?**

You'd need to know the variables of - (a) number of predictions (b) number of outcomes and (c) number of repeat predictions - the last one being the tricky one but I guess it could be calculated using probability in a model?

It is that last bit in C that is the thing making this harder, particularly as the number of entrants and outcomes affect it.

Hope that makes sense and happy to elaborate!

Thank you.