Need help with IA Correlation.

paiges

New member
Joined
Mar 17, 2018
Messages
2
Hello!
I hope I can provide a clear explanation of what I am trying to do.
I am working on my IB Math Studies' IA. I have decided on the topic: Is there a relationship between a
student’s grade level and GPA?

I conducted a survey and recorded my results in a table.

GPAFreshmanSophomoreJuniorSenior
0.0-0.51100
0.6-1.01211
1.1-1.53102
1.6-2.02422
2.1-2.55232
2.6-3.02333
3.1-3.53454
3.6-4.02314
4.1-4.51052


I have made scatter plots for each grade level. I also solved Pearson's Correlation Coefficient just for freshman grade level (0.0118).
My question is, how am I going to find the correlation among ALL grade levels? Do I need to use Pearson's for all grades? If yes, how do I compare the values to determine the correlation?
I think I may be going about this the wrong way.
ANY help is very much appreciated!
 
Last edited:
Hi paiges, welcome to the forum!

The correlation coefficient "just for freshman" makes no sense, because you've kept grade level fixed and are just looking at how GPA varies among different students at that grade level. But that is not the question you are trying to answer. The question you are trying to answer is as follows:

independent variable (x): grade level
dependent variable (y): GPA

Is there a correlation (in this case meaning a linear relationship) between these two variables?

I don't think that the Pearson correlation coefficient between these two variables is defined, because x has no variance. A student is always exactly in grade 10, or exactly in grade 11, etc., with no error. So you'll have to find a different statistic to figure this out. One statistic could just be the mean:

E.g. maybe seniors, on average, have a higher GPA because they work harder. To figure this out, you could compute the average GPA at each grade level and plot it vs. grade level (calling them 1, 2, 3, & 4, instead of freshaman, sophomore, etc). See if it makes a linear relationship. However, in addition to an average, you also have a spread (e.g. standard deviation) of GPAs at each grade level. That gives you an error bar on each of the points in your plot. I.e. an error on each of your averages. So you can ask the question: are these average GPAs as a function of grade level really different from each other, to within the error? I.e. are any differences you see actually statistically significant?

E.g. if you measure an average GPA for freshman of 3.5, and an average GPA for sophomores of 3.6, but the standard error of these values is 0.4 and 0.5, respectively i.e.

\(\displaystyle \mathrm{avg} = 3.5 \pm 0.4~(\mathrm{freshman}) \)
\(\displaystyle \mathrm{avg} = 3.6 \pm 0.5~(\mathrm{sophomore}) \)

You would conclude that there is no significant difference in performance. I.e. these GPA values are exactly the same as (or consistent with) each other, to within the observed margin of error.

Note: https://en.wikipedia.org/wiki/Standard_error#Standard_error_of_the_mean

I notice that you have binned your GPA data into bins of width 0.5, which is certainly useful for plotting a histogram of GPAs for each grade level and comparing the four histograms (which would also be really cool to show). However, I'm wondering: do you have the unbinned values (i.e. the exact GPA for each student)? You will probably need these to do an analysis of the type I just described above.

Another idea: since you have a distribution of GPAs for each grade, ask your math teacher for ideas on how to compare distributions and see whether they are really different from each other. E.g. chi-squared test to see if each grade level's distribution differs significantly from the overall distribution of grades (averaged over all students)?
 
Last edited:
Yeah a cool thing I just thought of is that you can consider the entire set of students you surveyed to be the population. That gives you a population mean and population standard deviation for GPA.

Then you can consider the students you surveyed at each grade level to be samples of that population. That gives you sample mean and sample standard errors for GPA. Are each of these samples random? I.e. are they representative of the population as a whole?
 
Hi paiges, welcome to the forum!

The correlation coefficient "just for freshman" makes no sense, because you've kept grade level fixed and are just looking at how GPA varies among different students at that grade level. But that is not the question you are trying to answer. The question you are trying to answer is as follows:

independent variable (x): grade level
dependent variable (y): GPA

Is there a correlation (in this case meaning a linear relationship) between these two variables?

I don't think that the Pearson correlation coefficient between these two variables is defined, because x has no variance. A student is always exactly in grade 10, or exactly in grade 11, etc., with no error. So you'll have to find a different statistic to figure this out. One statistic could just be the mean:

E.g. maybe seniors, on average, have a higher GPA because they work harder. To figure this out, you could compute the average GPA at each grade level and plot it vs. grade level (calling them 1, 2, 3, & 4, instead of freshaman, sophomore, etc). See if it makes a linear relationship. However, in addition to an average, you also have a spread (e.g. standard deviation) of GPAs at each grade level. That gives you an error bar on each of the points in your plot. I.e. an error on each of your averages. So you can ask the question: are these average GPAs as a function of grade level really different from each other, to within the error? I.e. are any differences you see actually statistically significant?

E.g. if you measure an average GPA for freshman of 3.5, and an average GPA for sophomores of 3.6, but the standard error of these values is 0.4 and 0.5, respectively i.e.

\(\displaystyle \mathrm{avg} = 3.5 \pm 0.4~(\mathrm{freshman}) \)
\(\displaystyle \mathrm{avg} = 3.6 \pm 0.5~(\mathrm{sophomore}) \)

You would conclude that there is no significant difference in performance. I.e. these GPA values are exactly the same as (or consistent with) each other, to within the observed margin of error.

Note: https://en.wikipedia.org/wiki/Standard_error#Standard_error_of_the_mean

I notice that you have binned your GPA data into bins of width 0.5, which is certainly useful for plotting a histogram of GPAs for each grade level and comparing the four histograms (which would also be really cool to show). However, I'm wondering: do you have the unbinned values (i.e. the exact GPA for each student)? You will probably need these to do an analysis of the type I just described above.

Another idea: since you have a distribution of GPAs for each grade, ask your math teacher for ideas on how to compare distributions and see whether they are really different from each other. E.g. chi-squared test to see if each grade level's distribution differs significantly from the overall distribution of grades (averaged over all students)?

Unfortunately, I did not find exact GPA for each student, and it is too late for that :( I did make scatter plots for each grade level using the mid-range for the y-axis. Is it appropriate for me to keep the Correlation Coefficient and say that I used it to determine the strength, relationship of the line of best fit..? Or should I just remove it?
Also, could you please explain more about using chi-squared for grade level vs. overall distribution?
 
Again, "scatter plots for each grade level" makes no sense to me. Think about what it is you are plotting. Answer these questions:

- what's on the y-axis of your (current) plots?
- what's on the x-axis of your (current) plots?

Is this (y vs. x) the relationship that you were trying to investigate? Spoiler alert: I don't think it is, unless I'm misunderstanding what you are doing. Can you show us one of these plots?

Okay, no problem if you didn't record the exact grade. You can just assume that every student's grade was exactly at the midpoint of the range that they reported being in. E.g. for the five freshmen who reported being in the range of 2.1-2.5, just say they all had a GPA of 2.3

If you do that, you should be able to look at the mean, standard deviation, and standard error as I suggested.

I think you should ask your teacher whether chi-squared (or a similar test) would be good, or whether it's beyond the scope of the project.
 
Top