- Weerapat "Go" Attachot (4.23)
- Albert Gilharry (4.25) http://rpubs.com/Albert_Gilharry/371863
- Justin Herman (4.45) http://rpubs.com/justin_herman_42/372094
- Rickidon Singh (4.47) http://rpubs.com/RSdata/372222
March 21, 2018
200 randomly selected students completed the reading and writing test of the High School and Beyond survey. The results appear to the right. Does there appear to be a difference?
data(hsb2) # in openintro package hsb2.melt <- melt(hsb2[,c('id','read', 'write')], id='id') ggplot(hsb2.melt, aes(x=variable, y=value)) + geom_boxplot() + geom_point(alpha=0.2, color='blue') + xlab('Test') + ylab('Score')
head(hsb2)
## id gender race ses schtyp prog read write math science socst ## 1 70 male white low public general 57 52 41 47 57 ## 2 121 female white middle public vocational 68 59 53 63 61 ## 3 86 male white high public general 44 33 54 58 31 ## 4 141 male white high public vocational 63 44 47 53 56 ## 5 172 male white middle public academic 47 52 57 53 61 ## 6 113 male white middle public academic 44 52 51 63 61
Are the reading and writing scores of each student independent of each other?
hsb2$diff <- hsb2$read - hsb2$write head(hsb2$diff)
## [1] 5 9 11 19 -5 -8
hist(hsb2$diff)
What are the hypothesis for testing if there is a difference between the average reading and writing scores?
\(H_0\): There is no difference between the average reading and writing scores.
\[\mu_{diff} = 0\]
\(H_A\): There is a difference between the average reading and writing score.
\[\mu_{diff} \ne 0\]
The observed average difference between the two scores is -0.545 points and the standard deviation of the difference is 8.8866664 points. Do these data provide confincing evidence of a difference between the average scores ont eh two exams (use \(\alpha = 0.05\))?
\[Z = \frac{-0.545 - 0}{ \frac{8.887}{\sqrt{200}} } = \frac{-0.545}{0.628} = -0.87\] \[p-value = 0.1949 \times 2 = 0.3898\]
Since p-value > 0.05, we fail to reject the null hypothesis. That is, the data do not provide evidence that there is a statistically significant difference between the average reading and writing scores.
2 * pnorm(mean(hsb2$diff), mean=0, sd=sd(hsb2$diff)/sqrt(nrow(hsb2)))
## [1] 0.3857741
The probability of obtaining a random sample of 200 students where the average difference between the reading and writing scores is at least 0.545 (in either direction), if in fact the true average difference between the score is 0, is 38%.
\[-0.545\pm 1.96\frac { 8.887 }{ \sqrt { 200 } } =-0.545\pm 1.96\times 0.628=(-1.775, 0.685)\]
Note that the confidence interval spans zero!
data(sat) head(sat)
## Verbal.SAT Math.SAT Sex ## 1 450 450 F ## 2 640 540 F ## 3 590 570 M ## 4 400 400 M ## 5 600 590 M ## 6 610 610 M
Is there a difference in math scores between males and females?
describeBy(sat$Math.SAT, group=sat$Sex, mat=TRUE, skew=FALSE)[,c(2,4:7)]
## group1 n mean sd min ## X11 F 82 597.6829 103.70065 360 ## X12 M 80 626.8750 90.35225 390
ggplot(sat, aes(x=Sex, y=Math.SAT)) + geom_boxplot()
ggplot(sat, aes(x=Math.SAT)) + geom_histogram(binwidth=50) + facet_wrap(~ Sex)
We wish to calculate a 95% confidence interval for the average difference between SAT scores for males and females.
Assumptions:
Standard error of the difference between two sample means
\[ SE_{ (\bar { x } _{ 1 }-\bar { x } _{ 2 }) }=\sqrt { \frac { { s }_{ 1 }^{ 2 } }{ { n }_{ 1 } } +\frac { { s }_{ 2 }^{ 2 } }{ { n }_{ 2 } } } \]
\[ SE_{ (\bar { x } _{ 1 }-\bar { x } _{ 2 }) }=\sqrt { \frac { { s }_{ M }^{ 2 } }{ { n }_{ M } } + \frac { { s }_{ F }^{ 2 } }{ { n }_{ F } } } = \sqrt { \frac { 90.4 }{ 80 } +\frac { 103.7 }{ 82 } } =1.55 \]
The goal of ANOVA is to test whether there is a discernible difference between the means of several groups.
Is there a difference between washing hands with: water only, regular soap, antibacterial soap (ABS), and antibacterial spray (AS)?
For ANOVA:
ggplot(hand, aes(x=Method, y=Bacterial.Counts)) + geom_boxplot()
desc <- describeBy(hand$Bacterial.Counts, hand$Method, mat=TRUE)[,c(2,4,5,6)] desc$Var <- desc$sd^2 print(desc, row.names=FALSE)
## group1 n mean sd Var ## Alcohol Spray 8 37.5 26.55991 705.4286 ## Antibacterial Soap 8 92.5 41.96257 1760.8571 ## Soap 8 106.0 46.95895 2205.1429 ## Water 8 117.0 31.13106 969.1429
\(MS_T\)
\(MS_E\)
Comparing
A Shiny App by Dr. Dudek to explore the F-Distribution: http://shiny.albany.edu/stat/fdist/
df.numerator <- 4 - 1 df.denominator <- 4 * (8 - 1) plot(function(x)(df(x,df1=df.numerator,df2=df.denominator)), xlim=c(0,5), xlab='x', ylab='f(x)', main='F-Distribution')
(f.stat <- 9960.64 / 1410.14)
## [1] 7.063582
1 - pf(f.stat, 3, 28)
## [1] 0.001111464
aov.out <- aov(Bacterial.Counts ~ Method, data=hand) summary(aov.out)
## Df Sum Sq Mean Sq F value Pr(>F) ## Method 3 29882 9961 7.064 0.00111 ** ## Residuals 28 39484 1410 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
hand.anova <- granova.1w(hand$Bacterial.Counts, group=hand$Method)
hand.anova
## $grandsum ## Grandmean df.bet df.with MS.bet MS.with ## 88.25 3.00 28.00 9960.67 1410.14 ## F.stat F.prob SS.bet/SS.tot ## 7.06 0.00 0.43 ## ## $stats ## Size Contrast Coef Wt'd Mean Mean Trim'd Mean Var. ## Alcohol Spray 8 -50.75 37.5 37.5 35.50 705.43 ## Antibacterial Soap 8 4.25 92.5 92.5 92.67 1760.86 ## Soap 8 17.75 106.0 106.0 98.33 2205.14 ## Water 8 28.75 117.0 117.0 115.33 969.14 ## St. Dev. ## Alcohol Spray 26.56 ## Antibacterial Soap 41.96 ## Soap 46.96 ## Water 31.13