## One- and Two-Sample t-tests

The t-test is quite simple, and the base-R functionality will likely be sufficient for all your related calculations. This section introduces the plots and testing functions that help us to conduct the inference based on the t-test and its nonparametric alternative, the Wilcoxon (or Mann-Whitney) test.

So far we have compared a single sample to a normal distribution. A
much more common operation is to compare aspects of two samples. Note
that in R, all "classical" tests including the ones used below are
in package **stats** which is normally loaded.

Consider the following sets of data on the latent heat of the fusion of
ice (*cal/gm*) from Rice (1995, p.490)

Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02 Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97

Boxplots provide a simple graphical comparison of the two samples.

A <- scan() 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02 B <- scan() 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97 boxplot(A, B)

which indicates that the first group tends to give higher results than the second.

To test for the equality of the means of the two examples, we can use
an *unpaired* *t*-test by

> t.test(A, B) Welch Two Sample t-test data: A and B t = 3.2499, df = 12.027, p-value = 0.00694 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.01385526 0.07018320 sample estimates: mean of x mean of y 80.02077 79.97875

which does indicate a significant difference, assuming normality. By default the R function does not assume equality of variances in the two samples. We can use the F test to test for equality in the variances, provided that the two samples are from normal populations.

> var.test(A, B) F test to compare two variances data: A and B F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1251097 2.1052687 sample estimates: ratio of variances 0.5837405

which shows no evidence of a significant difference, and so we can use
the classical *t*-test that assumes equality of the variances.

> t.test(A, B, var.equal=TRUE) Two Sample t-test data: A and B t = 3.4722, df = 19, p-value = 0.002551 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.01669058 0.06734788 sample estimates: mean of x mean of y 80.02077 79.97875

All these tests assume normality of the two samples. The two-sample Wilcoxon (or Mann-Whitney) test only assumes a common continuous distribution under the null hypothesis.

> wilcox.test(A, B) Wilcoxon rank sum test with continuity correction data: A and B W = 89, p-value = 0.007497 alternative hypothesis: true location shift is not equal to 0 Warning message: Cannot compute exact p-value with ties in: wilcox.test(A, B)

Note the warning: there are several ties in each sample, which suggests strongly that these data are from a discrete distribution (probably due to rounding).

There are several ways to compare graphically the two samples. We have already seen a pair of boxplots. The following

> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B)) > plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)

will show the two empirical CDFs, and `qqplot`

will perform a Q-Q
plot of the two samples. The Kolmogorov-Smirnov test is of the maximal
vertical distance between the two ecdf's, assuming a common continuous
distribution:

> ks.test(A, B) Two-sample Kolmogorov-Smirnov test data: A and B D = 0.5962, p-value = 0.05919 alternative hypothesis: two-sided Warning message: cannot compute correct p-values with ties in: ks.test(A, B)

Source: R Core Team, https://cran.r-project.org/doc/manuals/r-release/R-intro.html#One_002d-and-two_002dsample-tests

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.