Unit 5: Common Statistical Functions
The most significant advantage of R is probably the availability of functions for statistical analysis. The ultimate goal of most R courses is to give the learners access to this toolbox. Data and derived inference help shape our decisions; hence it is imperative to do the analysis right. This unit introduces built-in R functions for statistical analysis, from summarizing data and applying simple statistical tests to regression analysis. We will also see how to find additional R functions (packages) for certain types of analysis.
Completing this unit should take you approximately 3 hours.
5.1: Single-Sample Summaries
Single-sample summaries help us to quantify general patterns in the data. We can now represent something we have observed in a histogram or another plot with numbers. For example, the center of a distribution can be estimated as its mean (that is the statistical jargon for "average") or median, while the spread of the distribution can be quantified by its standard deviation or inter-quartile range (difference between the third and first quartiles). This section combines these numeric estimates with plots, so you can better understand what each of those summary statistics means.
5.2: The t-test
The Student's t-test is one of the most frequently used tests in statistics. It is applied to infer whether the population mean differs from some reference value or whether two populations differ. Moreover, the population parameter can be not just the overall average but some other parameter like a regression coefficient quantifying the relationships between variables – this greatly extends the applicability of the test beyond comparing just the population means. For example, we will use the t-test again in the linear regression section when assessing the regression coefficients' statistical significance.
5.3: One-Way ANOVA
We use the analysis of variance (ANOVA) method to compare means across more than two groups. Pairwise tests can be valid for many groups only when an adjustment for multiple testing is used (for example, see the function p.adjust). ANOVA avoids the multiple testing problem by applying the global F-test for any difference among group means. Functions in this section apply to the cases when, again, we have a grouping variable coded in R as a factor. In statistical texts, this variable is also called a factor (comprising several levels or, in the case of a single factor, treatments), meaning that this variable contributes to the differences in means across the groups. For example, physical activity could be a factor affecting the blood pressure (response variable) of a person measured in a clinical study, with the factor levels "No exercise", "Minor exercise", and "Intensive exercise".
5.4: Linear Regression
This section introduces the concepts of statistical modeling and linear regression. Most statistical models fit within this general regression framework, including the t-tests and ANOVA models from the previous sections. Learn this framework, which will be a basis for your more complex models and methods.
Unit 5 Assessment
- Receive a grade