normal or t distibution

vclaire · Aug 12, 2020

Hello,
I would like to ask for help in clarifying something about the t and Z distribution, more specifically, which one is correct to choose in the situation below. To make it easy to talk about it, I would like to start by writing an example.
I have a sample of 14 people with height data, the sample mean is 150, the standard deviation is 20. The question is what is the probability that someone is over 200 cm.

If I use Z-values, the calculation is simple:
I standardize the value of 200, i.e I calculate the Z-value for 180: Z(200) = (200-150) / 20 = 2.5 then I find the right-hand probability for Z = 2.5, which is 0.00621. But this does not take into account that it is a sample and not the population, nor does it take into account the samplesize.

When I look for a similar calculation of the t-value, I find everywhere only the formula that the one-sample t-test uses (of course, I also find the two-sample formula, but it is completely irrelevant to the present question):

(samplemean - populationmean) / (sampleSD / Sqrt(samplesize))

I see two solutions from here.
(1) I rephrase the question to correspond to a one-sample t-test, i.e., I consider 180 as the population mean and examine what is the probability that my sample differs from this population.
(150 - 200) / (20 / Sqrt(14)) = -9.354. Then I find the left-hand probability (because what i'm testing is, that if my sample is lower than H0) for t = -9.354 df=13, which is 0.00000019.

(2) I try to make the formula similar to what can be seen when calculating the Z-value, (X - samplemean) / (sampleSD/sqrt(samplesize)) = 9.354 and then I look for the rigth-hand probability because the question is what is the probability that someone is over 200 cm, the p-value is naturally the same 0.00000019.

I would have three questions:
for (1): Is it correct to rephrase the original question to correspond to the one-sample t-test? Are these two questions theoretically the same?
for (2): Is it correct to convert the formula for calculating the t-value to be similar to the formula for calculating the z-value.
for both: Why does a lower probability come out in the t-distribution calculation? Shouldn't the tail of the t-distribution be heavier at such a small samplesize; and accordingly, estimate the values at the edge with a higher probability of occurrence?

Thank you in advance for all the explanations!

tkhunny · Aug 12, 2020

The t-distribution is intended to reflect increased variance due to smaller sample sizes.

If you do not know the Population Variance, and you must estimate it, you are compelled to use t with appropriate degrees of freedom and not z - until your sample size is big enough that no one cares.

Sadly, none of this has anything directly to do with your dilemma. Just some background to keep in mind.

Take a good close look at your question and your data. Are you being asked to compare a MEAN or a SINGLE OBSERVATION? It makes a difference.

What is the probability that the mean height of the sample will be greater than 200? Calculate:

\dfrac{200 - 150}{\dfrac{20}{\sqrt{14}}}

This is likely to lead to a very small probability.

What is the probability that a single, randomly selected individual will have height greater than 200? Calculate:

\dfrac{200 - 150}{20}

This is a much smaller score.

vclaire · Aug 13, 2020

tkhunny said:
Take a good close look at your question and your data. Are you being asked to compare a MEAN or a SINGLE OBSERVATION? It makes a difference.

Thank you, based on your explanation, it is now clear what was wrong with my original solutions with t-distribution. The fact that I assigned the probability to the location of the sample mean there, rather than to the location of a single element in the sample.

I have one question left: is it correct to calculate (200-150) / 20 = 2.5 and for this value to look for the probability from the t-distribution table instead of the Z-distribution table, which would be p = 0.013 for t = 2.5 df = 13; which is really a higher value than what I got using the Z-distribution (p = 0.00621); and this corresponds to the fact that the t-distribution should have a heavier tail when the sample size is small.

Thank you again!

tkhunny · Aug 13, 2020

vclaire said:
I have one question left: is it correct to calculate (200-150) / 20 = 2.5 and for this value to look for the probability from the t-distribution table instead of the Z-distribution table, which would be p = 0.013 for t = 2.5 df = 13; which is really a higher value than what I got using the Z-distribution (p = 0.00621); and this corresponds to the fact that the t-distribution should have a heavier tail when the sample size is small.

Thank you again!

That's the idea. The same number of standard deviations should have more tail area in a t than in a Z. It's more likely to be farther out with a greater variance. Of course, the rule of thumb is that for n = 30+, they aren't very far apart. Usually that's the case. Make sure you think about it, though.

1) I would not just say, "I used Z."
2) I might say, "Sample size was 32, so I DECIDED to use Z."
3) I might say, "Sample size was 26, so I compared Z and t and it didn't make any difference for this decision, so I went with Z because it's easier to explain to my audience."

2) and 3) say, "I thought about it."

Good work. Excellent way to spend your time - thinking about it.

vclaire · Aug 13, 2020

tkhunny said:
That's the idea

Thank you for the explanation so it has become completely clear!

Thank you also for the tips on the wording, luckily I don’t have to use it per moment, as the question didn’t come to me in connection with a specific task, but I just wanted to better understand the concepts. Every time I turn to this forum with questions like this, you guys are a huge help!

normal or t distibution

vclaire

New member

tkhunny

Moderator

vclaire

New member

tkhunny

Moderator

vclaire

New member