Frequency distribution

Shawna

New member
Joined
Jun 29, 2013
Messages
7
Heya,
I've encountered a difficulty with the first part of a question I was given. It's probably :)D) quite minor but I really need some input about it.
I was given an initial frequency table, comprising only of x and its cumulative frequency distribution. I have added f(x) myself accordingly.
xf(x)F(x)
0-10001010
1000-15001020
1500-20002040
2000-24001050
2400-30003080
3000-350020100

The question asks me to remake the table making each class width = 500.
The answer sheet states that the upper category '0-1000' is divided exactly into 2 categories: 0-500 and 500-1000 and therefore its frequency is halved into 5 per class. Now, my question is, why should the frequency f(x) be halved? can't there be a situation where the observations (n) are scattered unevenly in this class? say, 4-6, 3-7 and so on.
I figured I can't find the exact location of x=500 using the percentile formula (Ck) in this case...
Anyone has any idea? Feeling quite stupid right now.

Thank you :)
 
Last edited:
Heya,
I've encountered a difficulty with the first part of a question I was given. It's probably :)D) quite minor but I really need some input about it.
I was given an initial frequency table, comprising only of x and its cumulative frequency distribution. I have added f(x) myself accordingly.
xf(x)F(x)
0-10001010
1000-15001020
1500-20002040
2000-24001050
2400-30003080
3000-350020100

The question asks me to remake the table making each class width = 500.
The answer sheet states that the upper category '0-1000' is divided exactly into 2 categories: 0-500 and 500-1000 and therefore its frequency is halved into 5 per class. Now, my question is, why should the frequency f(x) be halved? can't there be a situation where the observations (n) are scattered unevenly in this class? say, 4-6, 3-7 and so on.
I figured I can't find the exact location of x=500 using the percentile formula (Ck) in this case...
Anyone has any idea? Feeling quite stupid right now.

Thank you :)
The total number between 0-1000 is 10. Without any further information, when you split the interval into two you have to put half into each interval. Thus 0-500 -> 5, and 500-1000 -> 5. [The reason is that that will minimize the squares of the uncertainties of the numbers.] That is the easiest thing to do without having to justify it.

However, in this case you could justify an unequal split because the distribution is decreasing rapidly as x gets smaller than the peak bin. When you do something like that, you have to pay closer attention to the error bars assigned to the bins - which is a whole new topic ("propagation of errors"). I might arbitrarily estimate 0-500 -> x=3±2 and 500-1000 -> x=7±3 - without really working out the propagation of errors, which gets especially complicated when you extrapolate from part of the data to adjust another part - introducing correlations into the data.

So your question certainly isn't "stupid" - just at the level you are studying it is easiest to split equally.
 
I disagree slightly with Dr. Phil. If you only know there are 10 from 0 to 1000 you cannot know how many there are from 0 to 500 and from 500 to 1000. Dr. Phil says "Without any further information, when you split the interval into two you have to put half into each interval." That's where I disagree. You don't "have to". It is equally valid to split the 10 between the two intervals in any way you want. I agree that it is simplest to do that and that is a standard convention.

(Of course, the person you want to agree with is your teacher! You ought to check to see if your teacher is using Dr. Phil's convention.)
 
I disagree slightly with Dr. Phil. If you only know there are 10 from 0 to 1000 you cannot know how many there are from 0 to 500 and from 500 to 1000. Dr. Phil says "Without any further information, when you split the interval into two you have to put half into each interval." That's where I disagree. You don't "have to". It is equally valid to split the 10 between the two intervals in any way you want. I agree that it is simplest to do that and that is a standard convention.

(Of course, the person you want to agree with is your teacher! You ought to check to see if your teacher is using Dr. Phil's convention.)
A major concern in my professional career has been to re-bin data taken in a detector with square (Cartesian) elements into histogram bins relevant to a particular experiment (such as r-theta). Thus I have had to consider this problem in detail. In particular, whatever we do with the data, we must be able to propagate errors. I tell people that any procedure for which you can't propagate errors is invalid. One general "rule" is that you should never convert to bins that are smaller than the original. With that in mind, my answer would be NOT to split that lowest bin at all!

Again I say, excellent question!
 
In particular, whatever we do with the data, we must be able to propagate errors. I tell people that any procedure for which you can't propagate errors is invalid.
Dr Phil

I do not understand what you are trying to say. Are you saying that "we must be able to estimate the bounds on potential error." Personally, I prefer not to propagate errors, but I am very interested in being able estimate their potential bounds. I am not being fussy about word usage. If the point is to generate errors, I simply have no clue what you are talking about. For example, one of the reasons that the method of successive approximations is valuable is that errors do NOT propagate. Each new approximation is evaluated independently of whatever errors occurred in generating the approximation.

Not trying to be argumentative. Trying to understand.
 
"Prefer not to propagate errors"? If your initial data has errors, the only way you do that is not do any calculations!
 
"Prefer not to propagate errors"? If your initial data has errors, the only way you do that is not do any calculations!
One way to propagate errors is to introduce them; "propaganda" shares the same root as "propagate." I prefer not to assert that the world is flat. Moreover, if errors are known they can be removed. If, for example, I knew from calibration that a particular piece of equipment has a known bias in its measurements, I would eliminate that bias, not propagate it. Finally, some types of calculation reduce the probable range of relative error, and other types of calculation increase the probable range of relative error. "Propagation" implies that the latter type of calculation should be preferred over the former. The phrase used was "we must be able to propagate errors." If it means what it says, I disagree. If something else is meant, I'd like to know what.
 
Dr Phil

I do not understand what you are trying to say. Are you saying that "we must be able to estimate the bounds on potential error." Personally, I prefer not to propagate errors, but I am very interested in being able estimate their potential bounds. I am not being fussy about word usage. If the point is to generate errors, I simply have no clue what you are talking about. For example, one of the reasons that the method of successive approximations is valuable is that errors do NOT propagate. Each new approximation is evaluated independently of whatever errors occurred in generating the approximation.

Not trying to be argumentative. Trying to understand.
I failed to explain well enough: in the phrase "propagation of errors," the word "error" is understood to mean "statistical uncertainty." Raw histogram data follow Poisson statistics, so the Variances are equal to the data, standard deviations \(\displaystyle n\pm \sqrt{n}\), relative standard deviations \(\displaystyle 1/\sqrt{n}\). Background and calibration runs (and estimates of bias) carry their own Variances, which at some point are propagated into the data. Model parameters must be fitted to the data, and the fitting procedure MUST also propagate Variances. Every fitted parameter then has an associated statistical standard deviation, propagated from the raw data. NO PHYSICAL MEASUREMENT IS MEANINGFUL WITHOUT ERROR BARS.

The Variances are diagonal elements of an error matrix, with Covariances off diagonal. Sometimes we have to report the full matrix, but what I prefer to do is make the fitted parameters as independent as possible - for instance by diagonalizing the error matrix to see what linear combinations should be used.

BTW, in addition to demanding that any procedure not including propagation of errors is not valid, I have also been known to say that any procedure for which we CAN propagate errors IS valid. But I haven't gotten very far with that assertion!

I hope the word "statistical uncertainty" makes it more clear what is happening! Thanks for asking.
 
I failed to explain well enough: in the phrase "propagation of errors," the word "error" is understood to mean "statistical uncertainty." Raw histogram data follow Poisson statistics, so the Variances are equal to the data, standard deviations \(\displaystyle n\pm \sqrt{n}\), relative standard deviations \(\displaystyle 1/\sqrt{n}\). Background and calibration runs (and estimates of bias) carry their own Variances, which at some point are propagated into the data. Model parameters must be fitted to the data, and the fitting procedure MUST also propagate Variances. Every fitted parameter then has an associated statistical standard deviation, propagated from the raw data. NO PHYSICAL MEASUREMENT IS MEANINGFUL WITHOUT ERROR BARS.

The Variances are diagonal elements of an error matrix, with Covariances off diagonal. Sometimes we have to report the full matrix, but what I prefer to do is make the fitted parameters as independent as possible - for instance by diagonalizing the error matrix to see what linear combinations should be used.

BTW, in addition to demanding that any procedure not including propagation of errors is not valid, I have also been known to say that any procedure for which we CAN propagate errors IS valid. But I haven't gotten very far with that assertion!

I hope the word "statistical uncertainty" makes it more clear what is happening! Thanks for asking.
Thanks for responding. I think your usage of "propagate" and mine just differ. I do not think that what we are both thinking about differ, at least not in any important sense. I am interested in estimating the range of potential error (your error bars, I think) and, when possible, minimizing error build-up in computations. It is error build-up that the word "propagation of error" conjures up in my mind.

In the fields in which I have worked, index numbers are frequently used. These have the advantages that they do not imply artificial precision and that they reduce the effect of any unknown consistent bias in the raw numbers. However, I prefer z-scores in many respects because they remove the effects of consistent bias completely.
 
Thanks for responding. I think your usage of "propagate" and mine just differ. I do not think that what we are both thinking about differ, at least not in any important sense. I am interested in estimating the range of potential error (your error bars, I think) and, when possible, minimizing error build-up in computations. It is error build-up that the word "propagation of error" conjures up in my mind.

In the fields in which I have worked, index numbers are frequently used. These have the advantages that they do not imply artificial precision and that they reduce the effect of any unknown consistent bias in the raw numbers. However, I prefer z-scores in many respects because they remove the effects of consistent bias completely.
We are not entirely on the same page yet. The ONLY metric of dispersion that combines simply regardless of distribution shape is the standard deviation, or the Variance which is its square. So my error bars are ±sigma. What I do could as well be called "propagation of Variance."

Suppose
......\(\displaystyle \displaystyle \Phi = f(A,B,C,...)\)
where A,B,C... are independent variables with Variances V[A], V, V[C], ...

Then
......\(\displaystyle \displaystyle V[\Phi] = \left(\dfrac{\partial f}{\partial A}\right)^2 V[A] + \left(\dfrac{\partial f}{\partial B}\right)^2 V + \left(\dfrac{\partial f}{\partial C}\right)^2 V[C] + \cdot \cdot \cdot\)

Example 1: \(\displaystyle \displaystyle \Phi = A - B \implies V[\Phi] = V[A] + V\)
...........................Variance = sum of Variances

Example 2: \(\displaystyle \displaystyle \Phi = A/C \implies V[\Phi] /(A/C)^2 = V[A]/A^2 + V[C]/C^2\)
...........................relative Variance = sum of relative Variances

Example 3: \(\displaystyle \displaystyle \Phi = \exp{ \left[ \ln A - \ln C \right]} = A/C \)

...............[\(\displaystyle \dfrac{\partial \Phi}{\partial A} = \exp{ \left[ \ln A - \ln C \right]}\dfrac{1}{A} = \dfrac{1}{C} \)

...............[\(\displaystyle \dfrac{\partial \Phi}{\partial C} = \exp{ \left[ \ln A - \ln C \right]}\dfrac{-1}{C} = -\dfrac{A}{C^2} \)

...............[\(\displaystyle \displaystyle V[\Phi] = \dfrac{V[A]}{C^2} + \dfrac{A^2\ V[A]}{C^4\ V[C]} \)....identical to Example 2

Example 4: \(\displaystyle \displaystyle \Phi = \dfrac{A - B}{C - B}\)......note that B is subtracted from both numerator and denominator
..........................................it is NOT correct to find (A-B) and (C-B) separately and then divide.

................\(\displaystyle \dfrac{\partial \Phi}{\partial A} = \dfrac{1}{C-B}\),......\(\displaystyle \dfrac{\partial \Phi}{\partial B} = \dfrac{-1}{C-B}+\dfrac{A-B}{(C-B)^2}\),......\(\displaystyle \dfrac{\partial \Phi}{\partial C} = \dfrac{A-B}{(C-B)^2} \)

................\(\displaystyle \displaystyle V[\Phi] = \dfrac{\left( (C-B)^2\ V[A] + (A-C)^2\ V + (A-B)^2\ V[C] \right)}{(C - B)^4}\)
 
We are not entirely on the same page yet. The ONLY metric of dispersion that combines simply regardless of distribution shape is the standard deviation, or the Variance which is its square. So my error bars are ±sigma. What I do could as well be called "propagation of variance."
Dr. Phil

Yes, that all makes sense to me. Sorry to have been dense. Being fussy about language, I think an exact phraseology would be that the measurement of error must be incorporated into each successive step of a computation so that the effect of the measured error in the initial data on the final computation can also be measured. That is quite wordy so I can see why an abbreviation like "propagation of error" would arise.

I just have worked on different sorts of problems in my life, and the abbreviation was not familiar to me. Thank you
 
Dr. Phil

Yes, that all makes sense to me. Sorry to have been dense. Being fussy about language, I think an exact phraseology would be that the measurement of error must be incorporated into each successive step of a computation so that the effect of the measured error in the initial data on the final computation can also be measured. That is quite wordy so I can see why an abbreviation like "propagation of error" would arise.

I just have worked on different sorts of problems in my life, and the abbreviation was not familiar to me. Thank you
I am so accustomed to the jargon that I didn't take the time to spell it out - which you have done admirably! Thanks! :D

Note to Shawna - you must realize that we have gone WAY BEYOND anything you need to know to do your question! I hope we haven't scared you away. But it does show that a question you were afraid might be stupid was REALLY interesting to professionals in the field. Never be afraid to question!
 
Oh my! I got a bit lost 3-4 replies into this thread! Haha, I'm glad it started a debate though. Would love to be able to understand it all but at the same time it makes me relieved I am not doing a math degree! *scared*

I was going over some threads in this forum in the hope I might be able to help people with my very minimal knowledge in stats but figured I can't really be of any assistance and 'pay back' for the help I was given here thus far, and I rather not confuse anybody.

I have another question I've been struggling with, hopefully it isn't too much to ask (I'm really sorry about the English, it's not my native language) -

"The scores of 60 students in an English exam are known to be distributed in a negative asymmetrical skew. The score average is 72.
The lecturer decided to raise the score by 6 points for students whose score is less than the average."
As a result of the change, the average will grow:
1. by less than 3 points and the variance will decrease.
2. by more than 3 points and the variance will decrease.
3. by less than 3 points and the variance will increase.
4. by more than 3 points and the variance will increase.

Now, I know that since the distribution is negative and asymmetrical, the mean as a central measure is lower than the median or the mode, since it is more affected by the extreme values. Since the teacher has decided to raise the scores that are smaller than the average, the variance decreased due to the extreme values shifting towards the mean, which caused the entire value distribution to be less 'scattered'.
So the answer is either 1 or 2. But I am not sure how to get to 'less or more than 3 points'. I don't have the data about how many students received less than the mean to begin with. Any ideas?

Thank you very much.


Amazing forum by the way!
 
Last edited:
Oh my! I got a bit lost 3-4 replies into this thread! Haha, I'm glad it started a debate though. Would love to be able to understand it all but at the same time it makes me relieved I am not doing a math degree! *scared*

I was going over some threads in this forum in the hope I might be able to help people with my very minimal knowledge in stats but figured I can't really be of any assistance and 'pay back' for the help I was given here thus far, and I rather not confuse anybody.

I have another question I've been struggling with, hopefully it isn't too much to ask (I'm really sorry about the English, it's not my native language) -

"The scores of 60 students in an English exam are known to be distributed in a negative asymmetrical skew. The score average is 72.
The lecturer decided to raise the score by 6 points for students whose score is less than the average."
As a result of the change, the average will grow:
1. by less than 3 points and the variance will decrease.
2. by more than 3 points and the variance will decrease.
3. by less than 3 points and the variance will increase.
4. by more than 3 points and the variance will increase.

Now, I know that since the distribution is negative and asymmetrical, the mean as a central measure is lower than the median or the mode, since it is more affected by the extreme values. Since the teacher has decided to raise the scores that are smaller than the average, the variance decreased due to the extreme values shifting towards the mean, which caused the entire value distribution to be less 'scattered'.
So the answer is either 1 or 2. But I am not sure how to get to 'less or more than 3 points'. I don't have the data about how many students received less than the mean to begin with. Any ideas?

Thank you very much.


Amazing forum by the way!
Leave it to an English teacher to come up with this nutty adjustment. (I am allowed to be snarky about English teachers because my mother was one.) Notice what this adjustment will do. A student who got 71 originally will get an adjusted score of 77 and so get a better grade than a student who originally got a score of 76. Well, you are not asked to explain the oddities of teachers, but to do your assignment.

The definition of non-parametric skew is \(\displaystyle \dfrac{\mu - \nu}{\sigma},\ where\ \mu = mean,\ \nu = median,\ and\ \sigma = standard\ deviation.\)

So, as you observed, if non-parametric skew is negative, the mean is less than the median, which is > 72.

\(\displaystyle Let\ \lambda = the\ number\ of\ students\ who\ have\ an\ original\ score\ under\ the\ unadjusted\ mean\ of\ \mu.\)

\(\displaystyle \lambda \le what\ ?\) Why?

\(\displaystyle Let\ \kappa = the\ adjusted\ mean.\)

\(\displaystyle Let\ \chi_i = the\ unadjusted\ score\ of\ student\ i.\)

We assume that student 1 has the lowest score, student 2 has either the same score or a higher score, and so on.

\(\displaystyle \displaystyle So\ \mu = \dfrac{1}{60} * \sum_{i=1}^{60} \chi_i.\)

Can you figure out what the formula for \(\displaystyle \kappa\) is?

Give it your best effort and let us know where you get?
 
Last edited:
Top