Probibility Conundrum!

DWC19

New member
Joined
Jul 9, 2019
Messages
8
Hi all,

I'm wrecking my head over this. My answer is 1.1 transanctions so in practical terms we would expect 1 in 11,000 to be flagged.

A financial institution develops a fraud detection system which is not entirely accurate. If a
fraudulent transaction is detected, the probability that it is flagged is 0.99. On the other hand,
a transaction is flagged with probability 0.01 when it is not actually fraudulent. Assume that
the probability that a transaction is fraudulent is only 0.01.
What is the probability that the transaction is actually fraudulent if it has been flagged?
11,000 transactions a day go through the institution’s fraud detection system, what number
of these are expected to be flagged?


Thanks for your help
 
Please show your work. How did you get 1.1?

I presume that is your answer to the second question, and you haven't said anything about how you answered the first question (which has a surprising answer). In any case, your answer is wrong, and I can't tell why without seeing what you did. It's quite possible that you have misunderstood the question.
 
Please show your work. How did you get 1.1?

I presume that is your answer to the second question, and you haven't said anything about how you answered the first question (which has a surprising answer). In any case, your answer is wrong, and I can't tell why without seeing what you did. It's quite possible that you have misunderstood the question.
Hi Dr.Peterson,

Thanks for checking. I've since reviewed and come up with the answer of 0.009 probability (0.99*0.01). So 99% of fraudulent transactions will be flagged but 1% of those will be flagged incorrectly. This would give 108.9 transactions out of 11,000 being flagged as fraudulent (11,000 * 0.009).

Hope that is right!

Originally I was getting the probability by doing 0.01*0.01= 0.0001 probability
Then 11,000 *0.0001= 1.1 transactions
 
I've since reviewed and come up with the answer of 0.009 probability (0.99*0.01). So 99% of fraudulent transactions will be flagged but 1% of those will be flagged incorrectly. This would give 108.9 transactions out of 11,000 being flagged as fraudulent (11,000 * 0.009).
Careful! 0.99*0.01 = 0.0099, which rounds to 0.01, not 0.009. But which question is that the answer to?

You seem to be saying that it is the probability that a fraudulent transaction will not be flagged, i.e. P(not flagged | fraudulent)? That's not what either question asks, and it's not what you would multiply by 11,000 to answer the second question. The questions are,
  • What is the probability that the transaction is actually fraudulent if it has been flagged?
  • 11,000 transactions a day go through the institution’s fraud detection system, what number of these are expected to be flagged?
The first asks for P(fraudulent | flagged), and the second asks for P(flagged)*11,000.

What you actually calculated appears to be P(flagged | fraudulent)*P(fraudulent) = P(flagged AND fraudulent). That's only part of the probability of being flagged, as it misses the false flags.

It's very helpful in these problems to state clearly what each calculation means, as I just did. I also find it helpful to make a table with rows for Fraudulent and Not fraudulent, and columns for Flagged and Not flagged, so I can keep track of the various cases.

By the way, I'm unhappy with the sentence in the problem that says, "If a fraudulent transaction is detected, the probability that it is flagged is 0.99." I think "detected" and "flagged" would be the same thing; I'm assuming they mean, "If a fraudulent transaction is examined, ..."
 
This would give 108.9 transactions out of 11,000 being flagged as fraudulent (11,000 * 0.009).
11*9 = 99, so how can 11,000 * 0.009 = 108.9? When you multiply 11,000 by 0.009 the 0's in 11,000 and all the initial 0's in .0009 just moves the decimal point in the result of 11*9.
So 11,000 * 0.009 = ... 990 or 99 or 9.9 or .99 or .099 or .0099 ....
In fact 11,000 * 0.009 = 99 NOT 108.9 = 11,000*.0099
 
Careful! 0.99*0.01 = 0.0099, which rounds to 0.01, not 0.009. But which question is that the answer to?

You seem to be saying that it is the probability that a fraudulent transaction will not be flagged, i.e. P(not flagged | fraudulent)? That's not what either question asks, and it's not what you would multiply by 11,000 to answer the second question. The questions are,
  • What is the probability that the transaction is actually fraudulent if it has been flagged?
  • 11,000 transactions a day go through the institution’s fraud detection system, what number of these are expected to be flagged?
The first asks for P(fraudulent | flagged), and the second asks for P(flagged)*11,000.

What you actually calculated appears to be P(flagged | fraudulent)*P(fraudulent) = P(flagged AND fraudulent). That's only part of the probability of being flagged, as it misses the false flags.

It's very helpful in these problems to state clearly what each calculation means, as I just did. I also find it helpful to make a table with rows for Fraudulent and Not fraudulent, and columns for Flagged and Not flagged, so I can keep track of the various cases.

By the way, I'm unhappy with the sentence in the problem that says, "If a fraudulent transaction is detected, the probability that it is flagged is 0.99." I think "detected" and "flagged" would be the same thing; I'm assuming they mean, "If a fraudulent transaction is examined, ..."

Thanks for your detailed help. I'm really struggling with what the question is actually asking and have made errors in the order of my calculations so its confusing. What I put above was a typo, I got 108.9 transactions from doing 11,000 * 0.0099 (not 0.009). I realise now this misses the false flag probability. Reading what you have said I will try to start again and break it down.

(P1) Probability of transaction being fraudulent = 0.01
(P2) Probability of fraudulent transaction being flagged = 0.99
(P3) Probability of transaction being flagged accurately = 0.01

I assume the 3 possibilities have to be combined in the dependent order so P1*P2*P3 = 0.000009

Answer to question 1 is 0.000009 (overall probability of receiving a fraudulent transaction that has been accurately flagged)

First answer to question 2 is 11,000 * 0.000009 = 0.99 (1 transaction in 11,000 is expected to be fraudulent and accurately flagged)

However as you pointed out, question 2 just asks how many transactions are expected to be flagged irrespective of flag being accurate so I think I don't need to include P3.

Second answer to question 2 is P1*P2 = 0.0099, so 11,000 * 0.0099 = 108.9 (109 transactions in 11,000 are expected to be flagged)


Am I even getting close with the order of the probabilities and how i'm multiplying them?

Thank you
 
Reading what you have said I will try to start again and break it down.

(P1) Probability of transaction being fraudulent = 0.01
(P2) Probability of fraudulent transaction being flagged = 0.99
(P3) Probability of transaction being flagged accurately = 0.01

I assume the 3 possibilities have to be combined in the dependent order so P1*P2*P3 = 0.000009
You are describing P2 loosely and P3 wrongly, which makes it likely that you either are not fully understanding them, or will confuse yourself going forward. Exact wording, and careful rewording, are essential in probability.

As I showed before, you really need to state them explicitly in terms of conditional probabilities, which I am assuming you are familiar with.

Here's the problem data again, quoted exactly with labels added:

A financial institution develops a fraud detection system which is not entirely accurate.​
If a fraudulent transaction is detected, the probability that it is flagged (P2) is 0.99.​
On the other hand, a transaction is flagged with probability 0.01 when it is not actually fraudulent (P3).​
Assume that the probability that a transaction is fraudulent is only 0.01 (P1).​

The reality is, as I read it:

(P1) Probability of transaction being fraudulent = P(fraudulent) = 0.01 (correct)
(P2) Probability of fraudulent transaction being flagged = P(flagged | fraudulent) = 0.99 (correct, but unclear wording)
(P3) Probability of nonfrauduluent transaction being flagged wrongly = P(flagged | not fraudulent) = 0.01

What you said for P3 doesn't make sense. If you look through these, you see that they are saying (P1) that there are not many fraudulent transactions; (P2) most of them are caught; (P3) few are wrongly flagged.

You can't multiply these together to get something meaningful; never, ever assume such a thing!

Answer to question 1 is 0.000009 (overall probability of receiving a fraudulent transaction that has been accurately flagged)
No. You are asked:

What is the probability that the transaction is actually fraudulent if it has been flagged?​

That means P(fraudulent | flagged), not P(fraudulent and flagged), which is what you say you have calculated. Your calculation is wrong anyway.

First answer to question 2 is 11,000 * 0.000009 = 0.99 (1 transaction in 11,000 is expected to be fraudulent and accurately flagged)

However as you pointed out, question 2 just asks how many transactions are expected to be flagged irrespective of flag being accurate so I think I don't need to include P3.
Right: You have misinterpreted the question. And the probability you need to use here is not the one they asked for in the first question!

Second answer to question 2 is P1*P2 = 0.0099, so 11,000 * 0.0099 = 108.9 (109 transactions in 11,000 are expected to be flagged)

Am I even getting close with the order of the probabilities and how i'm multiplying them?
This is still wrong; this time you have correctly calculated what you say, but it is not what they asked. What you have calculated here is P(fraudulent) * P(flagged | fraudulent) = P(fraudulent and flagged). Do you recognize why this is?

But what they asked for is just the number that will be flagged, which is P(flagged) * total number.

You will probably be getting P(flagged) as P(flagged and fraudulent) + P(flagged and not fraudulent).

I know this is complicated, and I have no certainty that I have made it clear. I'd like you to ask questions about anything I've said that you aren't sure of, from the notation for conditional probabilities, to why I think the question means what I say it does.
 
Thank you Dr.Peterson

I appreciate you taking the time to explain as i can see the wording has every importance on how it is calcualated and the order it is done in. I can undertstand the logic but have very little maths skills, only what i rememeber from high school which was some time ago. I was never good at extarcting the calculation from the question! I have been trying to learn just to better understand your response.

Reading the below made me realise that my probility should be very close to 1 so my answer totally went the wrong way.
"If you look through these, you see that they are saying (P1) that there are not many fraudulent transactions; (P2) most of them are caught; (P3) few are wrongly flagged. "

For question 1 you put that they are looking for P(fraudulent | flagged) so i looked up conditional probibility and have come up with this answer:

P(fraudulent | flagged) is 0.01 * 0.99 / 0.01 = 0.99 (99% chance that a transaction is actually fraudulent if it has been flagged)

Explanation: The transactions have already been flagged so there is 0.99 probibility there are actually fraudulent as the accuracy is almost certain

Question 2 you put they are looking for P(flagged) * total number so my answer is:

P(flagged) * total number is 0.01 * 11,000 = 110 (110 transactions out of 11,000 are expected to be flagged)

Explanation: theres 0.01 probibility that a transanction will be fraudulent so will be flagged

I really hope i've got it!
 
Thank you Dr.Peterson

I appreciate you taking the time to explain as i can see the wording has every importance on how it is calcualated and the order it is done in. I can understand the logic but have very little maths skills, only what i remember from high school which was some time ago. I was never good at extracting the calculation from the question! I have been trying to learn just to better understand your response.
I hadn't known that you are not currently taking a probability course; that changes how I should be approaching this. (It's also one of the reasons we say "It really helps to know why you're working on math or what math course you're taking.") But it was becoming clear that you needed more guidance than I was assuming you should need. Again, feel free to ask any questions you have.

But considering that you had to look up conditional probability, it sounds like this problem is too advanced for your current knowledge, and you may need to back up a bit and work through a course on the subject. Can you tell us the context of your question, namely why you are working on a problem that is beyond you?

Reading the below made me realise that my probability should be very close to 1 so my answer totally went the wrong way.
"If you look through these, you see that they are saying (P1) that there are not many fraudulent transactions; (P2) most of them are caught; (P3) few are wrongly flagged. "
Good. I added this intentionally, to suggest thinking about what answers make sense, because many students drop their common sense when they open a math book -- and you're doing the sort of thinking I like to see. On the other hand, it turns out that problems of this type can yield surprising correct answers, so you can't go entirely by expectations!

For question 1 you put that they are looking for P(fraudulent | flagged) so i looked up conditional probability and have come up with this answer:

P(fraudulent | flagged) is 0.01 * 0.99 / 0.01 = 0.99 (99% chance that a transaction is actually fraudulent if it has been flagged)

Explanation: The transactions have already been flagged so there is 0.99 probability there are actually fraudulent as the accuracy is almost certain
Let's write your work in a way that shows what you are doing, and makes it easier to check whether it makes sense. I think you are doing this (though it's tricky because there are two different 0.01's):

P(fraudulent | flagged) = P(fraudulent) * P(flagged | fraudulent) / P(not fraudulent | not flagged)​

Why do you think that would be appropriate? The definition of conditional probability is this:

P(fraudulent | flagged) = P(fraudulent and flagged) / P(flagged)​

Your numerator is valid (do you see why?), but your denominator is not. That part takes more work. Essentially, you have to build up P(flagged), which is in fact what you need for part 2, as P(flagged and fraudulent) + P(flagged and not fraudulent), as I said before.

You may want to look up Bayes Theorem, which is commonly discussed in connection with this kind of problem. (I don't tend to use it explicitly, because I like to think rather than memorize formulas.)

Here is what I did immediately when I started work on the problem:

FlaggedNot flagged
Fraudulent0.01
Not fraudulent

This is a table for probabilities like P(flagged and fraudulent) and P(flagged and not fraudulent); I used the other givens to fill in the rest of the cells. The probability you need is the sum of the Flagged column.

Question 2 you put they are looking for P(flagged) * total number so my answer is:

P(flagged) * total number is 0.01 * 11,000 = 110 (110 transactions out of 11,000 are expected to be flagged)

Explanation: there's 0.01 probability that a transaction will be fraudulent so will be flagged

I really hope i've got it!
Again, you haven't yet worked out P(flagged), so this is not correct yet.
 
I hadn't known that you are not currently taking a probability course; that changes how I should be approaching this. (It's also one of the reasons we say "It really helps to know why you're working on math or what math course you're taking.") But it was becoming clear that you needed more guidance than I was assuming you should need. Again, feel free to ask any questions you have.

But considering that you had to look up conditional probability, it sounds like this problem is too advanced for your current knowledge, and you may need to back up a bit and work through a course on the subject. Can you tell us the context of your question, namely why you are working on a problem that is beyond you?


Good. I added this intentionally, to suggest thinking about what answers make sense, because many students drop their common sense when they open a math book -- and you're doing the sort of thinking I like to see. On the other hand, it turns out that problems of this type can yield surprising correct answers, so you can't go entirely by expectations!


Let's write your work in a way that shows what you are doing, and makes it easier to check whether it makes sense. I think you are doing this (though it's tricky because there are two different 0.01's):

P(fraudulent | flagged) = P(fraudulent) * P(flagged | fraudulent) / P(not fraudulent | not flagged)​

Why do you think that would be appropriate? The definition of conditional probability is this:

P(fraudulent | flagged) = P(fraudulent and flagged) / P(flagged)​

Your numerator is valid (do you see why?), but your denominator is not. That part takes more work. Essentially, you have to build up P(flagged), which is in fact what you need for part 2, as P(flagged and fraudulent) + P(flagged and not fraudulent), as I said before.

You may want to look up Bayes Theorem, which is commonly discussed in connection with this kind of problem. (I don't tend to use it explicitly, because I like to think rather than memorize formulas.)

Here is what I did immediately when I started work on the problem:

FlaggedNot flagged
Fraudulent0.01
Not fraudulent

This is a table for probabilities like P(flagged and fraudulent) and P(flagged and not fraudulent); I used the other givens to fill in the rest of the cells. The probability you need is the sum of the Flagged column.


Again, you haven't yet worked out P(flagged), so this is not correct yet.

Thanks Dr.Peterson

I have patchy high school level math at best so this is way past my knowledge. I work with data and i'm specifically looking at match thats typical within data science as i'm considering applying to study for a masters in data science. Trying to get a feel of how i would cope with math aspects of the course.

I'm clearly missing basic probiblilty convepts such as the depndency of the events in this question so will do as you suggest and study the topic before trying to answer it. I;m sure your explanaition will help me grasp the concepts as i can relate them to your methods.

Will hopefully come back to you soon.

Thanks again
 
Hi Dr.Peterson

Reading some material hasn't helped much becasue the examples only use 2 possibilities not 3. I'm still unsure how i add or multiply them together to get meaninful numbers.

Even doing the table confused me but i have filled it out like this because i think the key thing is that there are very few fraudulent transcations to start with so the likelihood of having flagged ones has to be unlikley

FlaggedNot flagged
Fraudulent
0.01​
0.01​
Not fraudulent
0.01​
0.01​

So the probibilites I have now are:

  • P(fraudulent) = 0.01
  • P(flagged | fraudulent) = 0.99
  • P(flagged | not fraudulent) = 0.01
  • P(flagged) = 0.02
I got P(flagged) by doing P(fraudulent)+P(flagged | not fraudulent). I'm not sure if P(flagged | fraudulent) should be in the equation but i know that if fradulent the transaction will almost certainly be flagged.

Following the conditional probibility formula you kindly adapted to this question "P(fraudulent | flagged) = P(fraudulent and flagged) / P(flagged)"

My answer to question one is P(fraudulent | flagged) = 0.01*0.99 / 0.02 = 0.495 (49.5% chance the transaction is actually fraudulent if it has been flagged)

Answer to question 2 is 0.02*11,000=220 (transactions out of 11,000 are expected to be flagged)

If i'm still wrong i'm guessing its how i'm working out P(flagged)?

Thank you
 
Yes, P(flagged) is what you've been missing from the start, so we have to deal with it.

Let's start by looking at the table. You dropped my last column, which represents the total probability of each row. I'll add that label:

FlaggedNot flaggedTotal
Fraudulent0.01
Not fraudulent

Now, here's what we're given (which I'm hoping you agree with):
  • Probability of transaction being fraudulent = P(fraudulent) = 0.01
  • Probability of fraudulent transaction being flagged = P(flagged | fraudulent) = 0.99
  • Probability of nonfrauduluent transaction being flagged wrongly = P(flagged | not fraudulent) = 0.01
The one number I wrote in there is the first fact, P(fraudulent) = 0.01. We can also fill in P(not fraudulent) = 1 - P(fraudulent) = 0.99:

FlaggedNot flaggedTotal
Fraudulent0.01
Not fraudulent0.99

Now, we can use a fundamental fact about conditional probability (essentially the definition turned inside-out): P(flagged and fraudulent) = P(flagged | fraudulent) * P(fraudulent) = 0.99 * 0.01 = 0.0099. I think you had this at one point. That goes in the upper left cell. And we can also get the cell next to it, since they have to add up to 0.01: P(flagged and not fraudulent) = 0.01 - 0.0099 = 0.0001:

FlaggedNot flaggedTotal
Fraudulent0.00990.00010.01
Not fraudulent0.99

Our third given fact allows us to fill in the second line similarly: P(flagged and not fraudulent) = P(flagged | not fraudulent) * P(not fraudulent) = 0.01 * 0.99 = 0.0099. And P(not flagged and not fraudulent) = 0.99 - 0.0099 = 0.9801:

FlaggedNot flaggedTotal
Fraudulent0.00990.00010.01
Not fraudulent0.00990.98010.99

As I said previously, this is the first thing I did when I saw the problem: taking the given data and filling in all the facts I'll need to answer questions.

Now, what is P(flagged)? (Hint: columns can be summed, too.)

Then you can answer the questions. As I've hinted, the first will surprise you.
 
Sorry Dr,Peterson I replied but mustn't of saved :(

I can see that I need to grasp Bayes a lot better to understand how I should be calculating each probability together with another. The table does help a lot with that as I can see what my calculations should total to so makes more sense now.

My answer to question one is P(fraudulent | flagged) = 0.01*0.99 / 0.0198 = 0.5 (50% chance the transaction is actually fraudulent if it has been flagged)

Answer to question 2 is 0.0198*11,000=217.8 (218 transactions out of 11,000 are expected to be flagged)

PLEASE tell me that's right ?
 
You've got it!

And the fact that the flag is only 50% reliable is the surprising lesson I hinted at.
 
You've got it!

And the fact that the flag is only 50% reliable is the surprising lesson I hinted at.

Thank you so much! I was surprised by that until i mistakenly got 49.5% then i started thinking that the flagging has to be uncertain if there's such a low possibility of getting a fraudulent transaction in the first place.

Safe to say i'm ready to put this one in the past :)

Sorry for trying to approach this with a lack of prior learning. I really appreciate your patience and explanations.
 
Top