determine expected value of a continuous triangular probability density function

pjbrad9

New member
Joined
Jan 18, 2020
Messages
10
Been unable to figure this out for a while...

I'm trying to develop a formula for calculating the expected value of a triangular probability density function if provided with the p10, mode, and p90 input parameters.

For example, I'm given a triangular probability density function with p10 = 10, mode = 12, and p90 = 16. My formula should calculate the expected value of this function, which is 12.86.

I started by isolating the case in which the p90 - mode > mode - p10 (i.e. a right-skewed triangle). In this case, the EV will be to the right of the mode on the number line.

I know that the area enclosed by this entire triangle should be set to 1. I also know that the EV divides the weight of the triangle evenly. Thought I had it cracked when I integrated the right side of this triangle from x=EV to x=P90, setting the result to 2/5. However, I couldn't get this to work.

I've used a mix of geometry and calculus but have struggled to figure it out. If given min, mode, and max I can easily calculate the p10 and p90 values. But doing it the opposite way is proving challenging. Think I'm just tired - feel like this should be easy.

Maybe someone has already done this and can get me going in the right direction.
 
If given min, mode, and max I can easily calculate the p10 and p90 values. But doing it the opposite way is proving challenging. Think I'm just tired - feel like this should be easy.

Let's start here. Given the min, mode, and max, what do you get for p10 and p90?

The link gives the expected value (mean) as (min+mode+max)/3, or you can derive that yourself. You just need to solve for min and max in terms of p10, mode, and p90, right?
 
Correct. If I can determine the min in terms of p10, mode, and p90, and do same for the max, then I can simply take the average of the min, mode, and max. The problem I am running into is I get an equation for the min that includes the max, and vice versa.
 
Attached is one approach I've started with...
 

Attachments

  • expected_value.pdf
    54.6 KB · Views: 3
"I know that the area enclosed by this entire triangle should be set to 1. I also know that the EV divides the weight of the triangle evenly. Thought I had it cracked when I integrated the right side of this triangle from x=EV to x=P90, setting the result to 2/5. However, I couldn't get this to work."

It is the median, not the mean, that divides the triangle into two equal areas, so the area from mean (EV) to p90 is not necessarily 0.4. What function did you integrate? - we don't know the equation of the line - or is there further info you haven't given?

I tried to use a geometric approach as follows.

Calling M the min (M<10) and X the max (X>16), the "height" of the triangle at the mode is 2/(X-M) since the area of the triangle is 1.

I then constructed "heights" at p10 (ie 10) and p90 (ie 16) and ended up with two sets of similar triangles, one set on the left of the mode and the other set on the right of the mode. This gave me expressions for the "height" at 10 and 16 in terms of M and X.

I then equated the areas of the little triangles (as they are both 0.1) to get one equation in terms of M and X. Setting one of these areas equal to 0.1 gave the second equation. Two equations, two unknowns. But pretty yukky to solve - I haven't bothered to do that yet. I think a GC or Wolfram could help there. But, like you pjbrad9, I'm thinking there must be an easier way.
 
Thanks for giving it a go, Harry. Yes, I misspoke when I said that the mean divides the weight of the triangle in half - that is the median as you said. I'm just tired (and desperate for a solution lol).

The functions I integrated, though, are shown in the pdf I attached in my previous comment. I derived f(x) and g(x) to represent the sloping sides of a general triangular distribution.

It is certainly yukky to solve haha. I thought I was really close to a solution when I last gave this a try using a simple geometric set-up. However, that simple set-up ultimately led to a system of 5 equations and 5 unknowns (yay) that I failed to solve (boo) when I got something like A = A after using Gaussian elimination. One thing that helps in using a geometric approach is that the relationships among min, p10, mode, p90, and max are independent of the height of the triangle. So I figured setting the height of the triangle to 1 would make this an easy problem. However, when you do this you no longer can say that the area of the triangle is necessarily equal to 1.

I'm becoming less convinced that there is a clean solution, and it likely involves a messy system of higher order equations. See, when given min, mode, and max, it's quite easy to find the p10 and p90, because this is essentially a problem of interpolation. You need both min and max to find either the p10 or p90. However, when given the p10 and p90, it is sort of like an extrapolation to get the min and max, but the min and max are dependent upon each other. I may not be saying it technically correct, but this is how it seems to me. Tricky (to me, at least - feeling kinda dumb given how much time I've pondered it).

Ultimately, I need to get this into an Excel spreadsheet where users are entering the p10, mode, and p90 values. I'd switch to having them enter the min, mode, and max, but that would present a whole other host of complications that I won't bore you with.

Maybe using goal-seek or iterating could do the job? Just thinking out loud at this point.
 
Last edited:
Using the formulas in https://en.wikipedia.org/wiki/Triangular_distribution for the CDF and the mean, I get a system of two nonlinear equations. Using their names a, b, and c for min, max, and mode, and taking p = p10 and q = p90, I have

[MATH]10(p-a)^2 = (b-a)(c-a)[/MATH]​
[MATH]10(b-q)^2 = (b-a)(b-c)[/MATH]​

I tried graphing these (treating a as x and b as y), and for the specific case p=1,c=5, q=9, I got this graph:
FMH120032.png

There is one solution where a and b are both in [0,10], so it looks theoretically possible. If you look for a way to solve with Excel, you'll have to make it look for the right solution!

I tried changing the data, and found that in some cases all four solutions may be in [0,10]; but only one has both a and b between p and q. Here is the case p=3, c=4, q=7:

FMH120032 3,4,7.png

I gave a half-hearted try to solving the system algebraically, and didn't make any real progress.

EDIT: Whoops! We don't want a and b between p and q; we want p and q between a and b. Unless I'm mistaken, that last example has two or three viable solutions.
 
Thank you for your work, Dr. Peterson. Yes, we want p and q between a and b in the convention you established. In your first example, where you made a symmetric distribution of p=1, c=5, and q=9, we know that the mean is 5. And so we know that min and max should be equidistant from 5. And so if I'm interpreting your results correctly, it appears that the upper intersection in quadrant 1 is the correct solution for that example, with the min (a) being approximately -2 and the max (b) being approx. 12. These are each approx. 7 units away from the mode of 5. We also can rule out the other solutions because we know that the following conditions must be met: a <= p and b >= q, and that upper left intersection is the only solution satisfying these conditions. For a valid triangular probability density function (PDF), there can only be one solution. It is possible, however, for the user to enter values for p, c, and q that are invalid for establishing a PDF. For example, it's impossible to have a p10 of 1, a mode of 2, and a p90 of 1,000,000,000,000. There is no triangular distribution that can be made to satisfy those inputs.

Again, thank you for sharing this. I think this gets me a step closer. I need to process it a bit more to make sure I understand your derivation. Then I need to figure out how to make it work efficiently in the spreadsheet. I'll let you know if I get to final resolution.
 
Clearly I hadn't (haven't) completely thought through the implications of my graphs; I had changed names of variables a couple times, including changing a and b to x and y for graphing, which led to some confusion. I don't know where I got the idea of everything being between 0 and 10!

Looking more carefully now, I see that the correct answer in the first example, as you say, is the upper left intersection, (-2.236, 12.236), the only one in which a<p and b > q (that is, x<1 and y>9). Here is a version of that graph with illegal values shaded. (I see you mistakenly said "quadrant 1", but did mean this correct answer.)

FMH120032 1,5,9.png

Just to make sure I'm not still missing something, let's plug this data (a = -2.236, c = 5, b = 12.236) into the formulas from Wikipedia.

Then [MATH]P(x<1) = \frac{(x-a)^2}{(b-a)(c-a)}= \frac{(1--2.236)^2}{(12.236--2.236)(5--2.236)} = 0.1[/MATH], as required. By symmetry, [MATH]P(x>9) = 0.1[/MATH] too.

So this looks good. "All you need to do" is to find either an algebraic solution, or a way for Excel to find it for you.
 
Yes, sorry I meant quadrant 2. Well the good news is I think I can work with this. The bad news is I think it’s going to be as messy as I feared. An algebraic solution will work best with Excel, so I’m going to work towards that. Thanks again for all your help.
 
Top