The t-test is quite simple, and the base-R functionality will likely be sufficient for all your related calculations. This section introduces the plots and testing functions that help us to conduct the inference based on the t-test and its nonparametric alternative, the Wilcoxon (or Mann-Whitney) test.
So far we have compared a single sample to a normal distribution. A much more common operation is to compare aspects of two samples. Note that in R, all "classical" tests including the ones used below are in package stats which is normally loaded.
Consider the following sets of data on the latent heat of the fusion of ice (cal/gm) from Rice (1995, p.490)
Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02 Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97
Boxplots provide a simple graphical comparison of the two samples.
A <- scan() 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02 B <- scan() 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97 boxplot(A, B)
which indicates that the first group tends to give higher results than the second.

To test for the equality of the means of the two examples, we can use an unpaired t-test by
> t.test(A, B) Welch Two Sample t-test data: A and B t = 3.2499, df = 12.027, p-value = 0.00694 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.01385526 0.07018320 sample estimates: mean of x mean of y 80.02077 79.97875
which does indicate a significant difference, assuming normality. By default the R function does not assume equality of variances in the two samples. We can use the F test to test for equality in the variances, provided that the two samples are from normal populations.
> var.test(A, B) F test to compare two variances data: A and B F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1251097 2.1052687 sample estimates: ratio of variances 0.5837405
which shows no evidence of a significant difference, and so we can use the classical t-test that assumes equality of the variances.
> t.test(A, B, var.equal=TRUE) Two Sample t-test data: A and B t = 3.4722, df = 19, p-value = 0.002551 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.01669058 0.06734788 sample estimates: mean of x mean of y 80.02077 79.97875
All these tests assume normality of the two samples. The two-sample Wilcoxon (or Mann-Whitney) test only assumes a common continuous distribution under the null hypothesis.
> wilcox.test(A, B) Wilcoxon rank sum test with continuity correction data: A and B W = 89, p-value = 0.007497 alternative hypothesis: true location shift is not equal to 0 Warning message: Cannot compute exact p-value with ties in: wilcox.test(A, B)
Note the warning: there are several ties in each sample, which suggests strongly that these data are from a discrete distribution (probably due to rounding).
There are several ways to compare graphically the two samples. We have already seen a pair of boxplots. The following
> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B)) > plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)
will show the two empirical CDFs, and qqplot
will perform a Q-Q
plot of the two samples. The Kolmogorov-Smirnov test is of the maximal
vertical distance between the two ecdf's, assuming a common continuous
distribution:
> ks.test(A, B) Two-sample Kolmogorov-Smirnov test data: A and B D = 0.5962, p-value = 0.05919 alternative hypothesis: two-sided Warning message: cannot compute correct p-values with ties in: ks.test(A, B)
Source: R Core Team, https://cran.r-project.org/doc/manuals/r-release/R-intro.html#One_002d-and-two_002dsample-tests
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.