Does "name" include First name and Family name (may be a middle name)?Hi. I'm trying to work out the probability of someone sharing the same name & date of birth as me.
I know roughly that there are 620 people in the UK with my name, how do I work out the rest?
Thanks in advance
Does "name" include First name and Family name (may be a middle name)?
Since you already know that "there are 620 people in the UK with my name" you just need the probability that one of those has the same birthday as you. Ignoring leap years, there are 365 days in a year so the probability a given person has the same birthday as you is 1/365. Of the 620 people with the same name, we expect that 620/365 or approximately 1.7 people will have the same name and the same birthday as you. If you want probability you need to divide that by the number of people in the U.K.
Hi. I'm trying to work out the probability of someone sharing the same name & date of birth as me.
I know roughly that there are 620 people in the UK with my name, how do I work out the rest?
We need to clarify what you want.
Are you looking for the probability that any given person you meet in the U.K. has both the same name and the same birthdate; or that there is someone in the U.K. who shares both; or something else? These are very different things.
Thanks for your reply. I'll try and clarify what I'm trying to do;
I'm doing some research on online casinos. Now they say if someone closes down an account and opens a new one using a different address & email, they are unable to link the accounts, even if that person uses the same name and date of birth.
They say that just linking accounts based on First Name, Surname & Date of Birth would return too many false positives. I'm trying to work out whether or not that is likely to be true (for the UK only).
Is that any clearer?
Thanks again!
Okay. You are looking for the probability that a random individual has the same first and last name and full date of birth as a given individual (whose account is being investigated).
Presumably you are actually thinking only about yourself as the individual, since you said you know the number of people with your name. That will vary considerably from person to person. If you are trying to suggest a particular general policy, and not just to make some private argument, then you really need to consider anyone, and not just your own name. I would go with the worst case -- what is the most common name in the country?
As far as the birthdate is concerned, you have to find how many people were born on that date (and are still alive), as a fraction of the entire current population. That, too, will vary considerably; there will be fewer with the birthdate Feb 28, 1918 than with, say, Jan 1, 1968. For the worst case, you might need to find the date on which the most people were born (maybe 9 months after some major event).
So, in order to argue that there will not be many false positives, you should probably take (A) the number of people with the most common name, over (P) the total population, times (B) the number of people with the most common birthdate, over (P) the total population. Both parts require research; you can't just assume some arbitrary probability distribution (e.g. that every date since 100 years ago, and every possible name, are equally likely).
If you want an exact answer, do the kind of research that Dr. Peterson suggested.Thanks for your reply. I'll try and clarify what I'm trying to do;
I'm doing some research on online casinos. Now they say if someone closes down an account and opens a new one using a different address & email, they are unable to link the accounts, even if that person uses the same name and date of birth.
They say that just linking accounts based on First Name, Surname & Date of Birth would return too many false positives. I'm trying to work out whether or not that is likely to be true (for the UK only).
Is that any clearer?
Thanks again!
Thank you very much!
So the most common name in the UK is David Smith (6300 people). The most babies born is 2000 in a day.
So I take the total population over 18 (49.74 million) and use your formula it gives me a result of 5.09.
Apologies for sounding daft but does that mean there are likely to be 5 David Smiths with the same date of birth?
Thanks again
Working with your numbers, there are probably 8 people with the same birth year named David Smith.Thank you very much!
So the most common name in the UK is David Smith (6300 people). The most babies born is 2000 in a day.
So I take the total population over 18 (49.74 million) and use your formula it gives me a result of 5.09.
Apologies for sounding daft but does that mean there are likely to be 5 David Smiths with the same date of birth?
Thanks again
That is assuming that all those David Smith wanted to "Cheat". If we assume 10% wanted to cheat - then the false positive will go down further.Working with your numbers, there are probably 8 people with the same birth year named David Smith.
Thus there are 8 * 7 / 2 = 28 possible pairs.
AB, AC, AD, AE, AF, AG, AH, BC, BD, BE, BF, BG, BH, CD, CE, CF, CG, CH, DE, DF, DG, DH, EF, EG, EH, FG, FH, GH
Thus, the probability of a single match on David Smith on the same day in the same year is about
28∗3651≈7%.
That may seem low, but that is just one name. Then you have to add in the probabilities for John Smith and Mary Smith plus the Jones's and the Johnsons and Thompsons and Richardsons etc.
False positives are virtually certain. How many false positive per year there are and what the costs of each false positive are then becomes the issue.
Thanks all for your help. Much appreciated!
Maybe I'm being twp, but I think this is partly a foolish exercise. What is the probability that your existence has NO INLUENCE on ANYONE else picking a birthdate or naming a child or that your situation was influenced by no one? Can you REALLY hide and influence NO ONE? In particular, if YOU have a child, who happens to be born on your birthdate, mightn't you be tempted to name the child after YOU? It's not random. Also, Names and Birthdates are not independent. For example, is there an unusual propensity to name children Noel or Noelle or Noél (because nonFrench don't know any better) or Noël or etc. when they are born on Dec 25?
Anyway...
If I have a child, it is unlikely to have the same date of birth as me.
This is actually a very serious exercise.
Gambling companies claim that they cannot identify players who have self excluded due to gambling addiction opening a new account, if they use a different email/address/phone but use their actual date of birth. They say that identifying such accounts using only name & DOB would bring up too many false positives. I am trying to establish whether or not that is likely to be the case. If someone opens a new account with the same name & DOB, a quick manual check could identify if it is the same person or not. How often are they going to be doing this? Once a day, hundreds of times a day?
Thanks for your input
Cheers
Just for the record, correlation and dependence will make the assumptions of no correlation and independent produce an incorrect result, but it may not make it sufficiently incorrect to make the result fail to be useful.Gambling is always a serious exercise. That's why I just stay away from it. There are enough unavoidable risks. Obviously, not everyone follows this avoidance philosophy. Keep up your good work!
Working with your numbers, there are probably 8 people with the same birth year named David Smith.
Thus there are 8 * 7 / 2 = 28 possible pairs.
AB, AC, AD, AE, AF, AG, AH, BC, BD, BE, BF, BG, BH, CD, CE, CF, CG, CH, DE, DF, DG, DH, EF, EG, EH, FG, FH, GH
Thus, the probability of a single match on David Smith on the same day in the same year is about
28∗3651≈7%.
That may seem low, but that is just one name. Then you have to add in the probabilities for John Smith and Mary Smith plus the Jones's and the Johnsons and Thompsons and Richardsons etc.
False positives are virtually certain. How many false positive per year there are and what the costs of each false positive are then becomes the issue.
Do you understand that this drastically changes your original question? For one thing, you used as your universe the entire adult population of the UK. That universe is entirely irrelevant unless every adult in the UK is an online punter with an admitted gambling addiction. I'd venture a guess that the relevant universe is one or two orders of magnitude smaller, which completely alters the math.If I have a child, it is unlikely to have the same date of birth as me.
This is actually a very serious exercise.
Gambling companies claim that they cannot identify players who have self excluded due to gambling addiction opening a new account, if they use a different email/address/phone but use their actual date of birth. They say that identifying such accounts using only name & DOB would bring up too many false positives. I am trying to establish whether or not that is likely to be the case. If someone opens a new account with the same name & DOB, a quick manual check could identify if it is the same person or not. How often are they going to be doing this? Once a day, hundreds of times a day?
Thanks for your input
Cheers