Can anyone please help me with part of my AP statistics project? It is due tomorrow

Marian411 · Jan 29, 2018

This is the problem I’m struggling with-
[FONT=&quot] According to New Jersey Transit, the 8:00 am weekday train from Princeton to New York City has a 90% chance of arriving on time. To test this claim, an auditor chooses 6 weekdays at random during a month to ride this train. The train arrives late on 2 of those days. Does the auditor have convincing evidence that the company’s claim isn’t true? Design and carry out a simulation using Table D (start at line 101) to estimate the probability that a train with a 90% chance of arriving on time each day would be late 2 or more of 6 days. Follow the four-step process.

Here is what I have tried but my teacher I have emailed has let me know it’s not correct. I don’t understand how to set it all up right.
[/FONT]State the question of interest using the language of probability.
Solution:
What is the probability that, in a sample of six weekdays at random out of a month for a train that arrives on time 90% of the time, the train is late on two of these six days?
(b) (5 pts.) How would you use random digits to imitate one simulation of the process? Are you going to use a random sampling with or without replacement? Explain!
Solution:
Let the numbers 1-5.4 represent on time days and 5.5-6 represent late days. We will use random sampling with replacement because each of these numbers represents a day in the month. Sampling without replacement would greatly change the proportion of on time days in this simulation.
(c) (5 pts.) What variable would you measure in one simulation?
Solution:
We will counts the numbers 28-30. (late days)
(d) (10 pts.) Use R to carry out one such simulation that you described in (b) and (c) by creating a sequence of numbers. Paste your commands together with the corresponding output. (Use your student ID to seed R)
Solution:
> pop = seq(1,100)
> pop
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
(e) (20 pts.) Perform 1000 repetitions of the simulation. What conclusion would you draw? Paste your code from the R Editor. Also, paste your result from the R Console.
Solution:
Code:
pop = seq(1,100)
total = c()
for(i in 1:1000){
samp = sample(pop,6, replace=T)
vectorTF = samp<=6
countCB = sum(vectorTF)
total = c(total,countCB)
}
sum(total == 2)/length(total)
Result:
> pop = seq(1,100)
> total = c()
>
> for(i in 1:1000){
+ samp = sample(pop,6, replace=T)
+ vectorTF = samp<=6
+
+ countCB = sum(vectorTF)
+
+ total = c(total,countCB)
+ }
>
> sum(total == 2)/length(total)
[1] 0.043
(f) (+ 5 pts. Extra Credit) Find the theoretical probability that the train will be late on 2 out of 6 randomly selected days.
Solution:
Let A=event that a day where train is on time is selected.
Let L=event that a day where train is late is selected.
A=.90
L=.10
Let us determine first how many outcomes are there where we can have two late train days in a random sample of 6:LLAAAA, LALAAA,LAALAA,LAAALA, LAAAAL,ALAAAL, so we have 6 such outcomes of interest.
Therefore, P(two late days) =P(LLAAAA or LALAAA or LAALAA or LAAALA, LAAAAL or ALAAAL
=P((LLAAAA)+P(LALAAA)+P(LAALAA)+P(LAAAAL)+P( ALAAAL)
=P(L)P(L)P(A)P(A)P(A)P(A)+P(L)P(A)P(L)P(A)P(A)+P(L)P(A)P(A)P(L)P(A)P(A)+P(L)P(A)P(A)P(A)P(A)P(L)+P(A)P(A)P(A)P(A)P(L)
=(.10)^2(.90)*6=0.007553552
In “R”:
> .10^2.90*6
[1] 0.007553552

j-astron · Jan 30, 2018

Hi Marian411,

For the extra-credit part (f) (theoretical probability of getting 2 late days out of 6 trials): do you know what the binomial distribution is?

https://en.wikipedia.org/wiki/Binomial_distribution

Long story short: you didn't tally up the number of ways of getting 4 successes out of six trials correctly. There are (6 choose 4) = 15 ways for that to happen.

For the earlier parts: sampling a random number from a sequence from 1 to 100 is a reasonable thing to do. And you're right that it's sampling with replacement, because each of the six trials is independent: you want every single one of them to have a 90% chance of success. If you start removing possible numbers from the list, you change that probability from 90% to something else.

HOWEVER, where you are going wrong is in figuring out whether your trial was successful or not i.e. was it late, or on time? In order to match the probability given in the set up, 10% of your numbers from 1 to 100 have to signify "late", so that you have a 10% chance of being late. For example, you could decide that if your sampling on a given attempt gives you a number between 91 and 100 inclusive, then that signifies "late". Which makes sense, because this is 10 out of the 100 numbers that signify lateness, leading to a 10% chance of being late. You could instead decide that 1-10 means late, or 21-30, or whatever. It doesn't matter, as long as it's 10 of the 100 numbers. So, in your code, you should be checking for lateness by finding where samp>=91 (for example), but certainly not samp <=6. The number 6 has nothing to do with anything, when you're drawing the numbers from 1 to 100 randomly..

The six comes in as follows: the experiment involves choosing a sample from the sequence [1,100] six times in a row, and counting how many times out of six you were late. (I.e. how many times out of the six trials that you drew a number >= 91, signifying lateness.)

1000 experiments in a row would then be doing 1000 sets of these six trials, and looking at the frequency of these experiments that resulted in 2 late trials out of 6. E.g. maybe 150 of the 1000 experiments resulted in 2 out of 6 trials being late. In that case, you'd conclude that, experimentally, experiencing 2 out of 6 late days is 15% likely.

Hope this helps.

Can anyone please help me with part of my AP statistics project? It is due tomorrow

Marian411

New member

j-astron

Junior Member