Calculating Lambda in a Poisson distribution

Aetius

New member
Joined
Dec 29, 2018
Messages
2
I want to apply the Poisson distribution on highway robberies and highway accidents events to predict next day events. For this I would calculate the average of daily robberies/accidents occurring during the previous 10 days, to obtain λ and factor it into the Poisson formula.

The objective would be to calculate the Poisson P(X ≥1) of an event (or more) occurring the next day (after the 10-day period.
I have reasons to believe that the latest values in the 10 day period are more representative than the older values. To add accuracy to λ and to calculate the Poisson cumulative probability for the next days (day by day), I can do one of the following two:


  1. Take the previous 10 daily values and divide it by 10 to find λ. I would enter the value in the Poisson formula to estimate the cummulative Poisson probability of one or more events occurring on the next day; I would fo this to calculate every “next day”. This would be akin of calculating the moving average of a 10-day period, as the latest value will be added to the sequence and the last one would be dropped on every new 10-day sequence.


  1. I would calculate the EMA-Exponential Moving Average of the same 10-day period to calculate the new λ and calculate the expected highway robberies/accidents I should expect occurring in the next day (11th day).

QUESTION: Which of the two would you apply? Would it violate the premise of the Poisson distribution in any way?
Thank you very much!
 
I want to apply the Poisson distribution on highway robberies and highway accidents events to predict next day events. For this I would calculate the average of daily robberies/accidents occurring during the previous 10 days, to obtain λ and factor it into the Poisson formula.

The objective would be to calculate the Poisson P(X ≥1) of an event (or more) occurring the next day (after the 10-day period.
I have reasons to believe that the latest values in the 10 day period are more representative than the older values. To add accuracy to λ and to calculate the Poisson cumulative probability for the next days (day by day), I can do one of the following two:


  1. Take the previous 10 daily values and divide it by 10 to find λ. I would enter the value in the Poisson formula to estimate the cummulative Poisson probability of one or more events occurring on the next day; I would fo this to calculate every “next day”. This would be akin of calculating the moving average of a 10-day period, as the latest value will be added to the sequence and the last one would be dropped on every new 10-day sequence.


  1. I would calculate the EMA-Exponential Moving Average of the same 10-day period to calculate the new λ and calculate the expected highway robberies/accidents I should expect occurring in the next day (11th day).

QUESTION: Which of the two would you apply? Would it violate the premise of the Poisson distribution in any way?
Thank you very much!

Poisson doesn't care how you find it's mean and variance. Both methods are arbitrary and represent simply a different scaling method. What matters is what you think may be more appropriate. If you think more recent data are more relevant, then the simple average may need to be discouraged.

Why do you keep saying "cummulative" (sic)? That doesn't make sense. Given a mean value for a single day, you can then easily calculate p(0), p(1), p(2), etc. If you wish to add up some of them, I suppose that's cumulative in some sense.
 
Poisson doesn't care how you find it's mean and variance. Both methods are arbitrary and represent simply a different scaling method. What matters is what you think may be more appropriate. If you think more recent data are more relevant, then the simple average may need to be discouraged.

Why do you keep saying "cummulative" (sic)? That doesn't make sense. Given a mean value for a single day, you can then easily calculate p(0), p(1), p(2), etc. If you wish to add up some of them, I suppose that's cumulative in some sense.

Allow me to clarify a couple of things, as I am a bit confused by your reply:

- What I want is to calculate is the cumulative Poisson probability. This will tell you the probability that at least one event or more will occur (in this case) next day. That is why the Poisson Random Variable will always be 1 (I am not sure what is your doubt about the need to calculate the Poisson cumulative probability).

- Lambda is crucial to calculate the average rate of success, which will enable you to calculate P(X ≥1) more accurately.

-
The more accurate Lambda is, the more accurate the calculation of the cummulative probability will be.

- Lambda is a garden variety average calculation. Using the exponential moving average applied to the previous 10 day-data set will give more importance to the newest values which is desirable. The simple moving average calculation will not. A "better" Lambda will lead to a more accurate Poisson cumulative probability y, when calculating it for the next day.

So, once again, the question is to decide which should be used to get a more accurate Lambda to factor into the Poisson probability calculation and especially, whether any Poisson premise would be violated by using EMA (which I am inclined to apply).

I hope this will clarify the issue.
Thanks
 
Allow me to clarify a couple of things, as I am a bit confused by your reply:

- What I want is to calculate is the cumulative Poisson probability. This will tell you the probability that at least one event or more will occur (in this case) next day. That is why the Poisson Random Variable will always be 1 (I am not sure what is your doubt about the need to calculate the Poisson cumulative probability).

- Lambda is crucial to calculate the average rate of success, which will enable you to calculate P(X ≥1) more accurately.

-
The more accurate Lambda is, the more accurate the calculation of the cummulative probability will be.

- Lambda is a garden variety average calculation. Using the exponential moving average applied to the previous 10 day-data set will give more importance to the newest values which is desirable. The simple moving average calculation will not. A "better" Lambda will lead to a more accurate Poisson cumulative probability y, when calculating it for the next day.

So, once again, the question is to decide which should be used to get a more accurate Lambda to factor into the Poisson probability calculation and especially, whether any Poisson premise would be violated by using EMA (which I am inclined to apply).

I hope this will clarify the issue.
Thanks
I am utterly confused. Admittedly, tkunny is much better at this sort of thing than I am, and I may simply be misunderstanding you, but I am mystified.

Whether the Poisson distribution is appropriate as a model depends on whether the reality to be modeled satisfies the assumptions peculiar to that distribution, e.g. independence of events. Unless your method of estimation has some material probability of altering the reality to be modeled, it is independent of the choice of model.

It is of course true that the reliability of the estimated values fed into the model will affect the reliability of the model's results, but the model is appropriate or not depending on how closely reality satisfies the model's assumptions. You can put the exact value of lambda into the model and get terrible results if reality differs materially from the model's assumptions. If reality closely resembles the idealizations of the model and you feed a reasonably exact estimate of lambda into the model, you will on average get good results even if your reasonably exact estimates of lambda come from consulting the entrails of a chicken.

My other point of confusion is that if you want to find

\(\displaystyle \text {P(} x \ge 1 \text {)}\),

why is that any more complicated than

\(\displaystyle \text {P(} x \ge 1\text {)} = 1 - \text {P(} x = 0 \text {).}\)
 
- What I want is to calculate is the cumulative Poisson probability. This will tell you the probability that at least one event or more will occur (in this case) next day. That is why the Poisson Random Variable will always be 1 (I am not sure what is your doubt about the need to calculate the Poisson cumulative probability).

- Lambda is crucial to calculate the average rate of success, which will enable you to calculate P(X ≥1) more accurately.

-
The more accurate Lambda is, the more accurate the calculation of the cummulative probability will be.

I see. Here's the problem. The Cumulative Probability Distribution STARTS at \(\displaystyle -\infty\) (or at x = 0 in the case of the Poisson). Thus, if you want the Cumulative Probability, you cannot have it if you discard the very first probability mass. In other words, "cumulative probability" isn't what you mean. You do mean that you wish to accumulate all probabilities other than the first, but "cumulative probability" is already taken - you mustn't use that phrase to describe what you want.

- Lambda is a garden variety average calculation. Using the exponential moving average applied to the previous 10 day-data set will give more importance to the newest values which is desirable. The simple moving average calculation will not. A "better" Lambda will lead to a more accurate Poisson cumulative probability y, when calculating it for the next day.

Your reliance on the terms "better" and "more accurate" is a little unsettling. This is a random process, as you have defined it. You will get variable results. Will one selection of random values be "better" than another set of random values? (Please refer to the discussion of "ideal" results, above.) What you should mean, is that your model will be improved in that you will produce more accurate predictions over the LONG run. You cannot apply this hope to a single day's result.

So, once again, the question is to decide which should be used to get a more accurate Lambda to factor into the Poisson probability calculation and especially, whether any Poisson premise would be violated by using EMA (which I am inclined to apply).

The conditions of the Poisson distribution are these:
1) The number of successes in two disjoint time intervals is independent.
2) The probability of a success during a small time interval is proportional to the entire length of the time interval.

You cannot violate these conditions with your selection of lambda.
 
Top