Topic Name Description
Course Syllabus Page Course Syllabus
1.1: R and Coding Environments Book Overview of R

Read about R, its history, connections to other languages, and alternatives for statistical computing. You will also learn about various interfaces that can be used to edit and run R code, such as RStudio.

Page Introduction to R and RStudio

Here is a short introduction to the basics of the RStudio layout. Now you should have a general understanding of what is R and RStudio, R links to other languages, and potential extension of functionality via user-contributed packages of functions. The next section will teach you how to install and set up this software.

1.2: Installing and Setting Up R and RStudio Page Installing R and RStudio

Follow the steps in the video to download and install R, then download and install RStudio Desktop. Both programs are completely free.

The first one is the "core," so it is better if you install it first. Then the interface (RStudio), while being installed, can easily find the R installation on your computer and connect to it. When you start RStudio, it shows the version of R on your computer it has connected to in the console.

Page Setting up RStudio

This video will introduce you to the RStudio interface and its main settings. Implement those settings on your computer because they help you to foster good coding habits. By now, you should be comfortable setting up the R and RStudio coding environment on your computer.

Page Updating Software

If needed, you can update R itself by downloading and installing a new version from the R Project website (by default, after restarting the RStudio, it will start using the newest version of R from those installed on your computer). RStudio can be updated, for example, by clicking Tools > Check for Updates.

1.3: Command Line and Script Book Using R as a Calculator

All complex analyses comprise sequences of multiple simple operations. To get acquainted with R and "build trust," please read about these operations you can implement in the R (RStudio) console and immediately see the results. Run the codes on your computer. Do you get the same outputs? Try changing some of the code to see how the outputs will change. You should be able to guess what kind of output you can get from any of these simple operations.

Page Practice: Calculator

To start getting familiar with R, go ahead and try the examples on your machine. Do you always get the output you expect? Create some variables (see the section on Variable Assignment) and check that they appear in the RStudio Environment (panel in the top right). This exercise does not count toward your grade. It is just for practice!

1.4: Functions and Packages Book Functions

You have used a few functions already. Here is just a bit more formal introduction. You should be able to understand the inputs (arguments) you specify when calling a function and the output it returns. For most functions, you can get help by executing ?function_name in the console.

Page Practice: Functions

Here you will practice applying the function lm, which is one of the frequently-used base-R functions. Please open the help file for lm by executing the function (?function_name) you learned previously in this section. Read the help file to understand which arguments are required for the function to run and which are optional. Execute the examples given at the end of the help file. We will return to this function when introducing statistical models later in the course.

Book Packages

R functions come in packages like tools come in different toolboxes. You will use the function install.packages("PackageName") to get the needed package and library(PackageName) to load it in your R environment. After the package is installed, add a comment to the installation code like this

# install.packages("PackageName")

so your script doesn't reinstall the package each time but keeps it in

library(PackageName)

 because it is needed each time you start a new R session. It is a good idea to keep all library() statements at the top of your script so you can easily see all packages needed and do not duplicate the library calls.

Page Updating R and Its Packages

New versions of R are released every year. The best way to be up-to-date with R is to check the CRAN website periodically. The website is updated for each new release and makes the release available for download. You'll have to install the new release, and the process is the same.

Page Practice: Functions and Packages

As your code gets bigger, it might be hard to spot an error (a "bug" is what programmers call it politely). These examples let you practice debugging (finding and fixing errors in the code) for the code to run smoothly and correctly. Install R packages if needed for the examples to run. This exercise does not count toward your grade. It is just for practice!

1.5: Management of Code and Other Files Book R Projects and Files in a Project

First, watch the video, then read about the projects and R working directories. The video demonstrates one of the ways you can efficiently manage your files in a project. The discussed file structure will work in many cases but may need to be revised when large data are used and it is impossible or impractical to move the data to the local data folders. Also, the video assumes that each script file (for data loading, cleaning, plotting, etc.) is relatively large; hence it makes sense to keep the code separately so it is more manageable. Suppose each file (for data loading, cleaning, visualizing, and statistical analysis) contains just a few lines. In that case, it might be more practical to keep the codes together in a single script – you are free to decide based on the needs and size of your project.

Page Practice: R Projects

Try this exercise to practice creating and using R projects. This exercise does not count toward your grade. It is just for practice!

Book Best Practices for Writing R Code

It is helpful to keep your work tidy so you or other users can reuse the code later. Add many comments to your code to explain what and why you are doing or trying to achieve, as well as follow other recommendations in this section.

2.1: Data Types Book Basic Data Types and Data Structures in R

These are the data types we encounter in everyday work in R. You should learn about their differences and how to access their basic attributes. Most often, we need to know whether the data are in the correct format (such as numeric instead of character) and the size of the R object (use functions length(x) and dim(x) for that).

Page Practice: Data Types

Now let's try a quick practice exercise creating variables. This exercise does not count toward your grade. It is just for practice!

Book Strings

Although not all of us are linguists or text analysts, R functions for operating with text strings are still useful. They will come in handy when you need to match records in the data or select a portion of a textual record (for example, only the first name and not the surname). This section covers the basics of these operations.

Page Practice: Strings

This practice aims to make you more familiar with the string format and operations we often need to match or subset strings while preparing data for analysis. Try to solve these string operations tasks. This exercise does not count toward your grade. It is just for practice!

Book Factors

Factors are the way categorical variables are stored in R. For example, treatment levels in ANOVA (analysis of variance) are considered factors; months or quarters of the year can be represented as factors for modeling seasonality. You should learn how to create factors, rename and reorder factor levels for convenience, and correct analysis (for example, the control treatment usually should be the first level of a factor because, by default, other levels are compared to the first one in linear models).

Page Practice: Factors

In these exercises, you will practice operations with factors needed for implementing the analysis of variance (ANOVA) analysis or drawing a boxplot. Also, try applying the function table(X) to some factor X in your R environment – it is the function that quickly counts the number of occurrences of each element in X. These exercises do not count toward your grade. They are just for practice!

2.2: Vectors Book Vectors and Simple Manipulations

This section introduces the basic operations on vectors, most of which are done element-wise. Please pay attention to the recycling of vectors (usually, recycling doesn't generate an error or a warning, so it is easy to miss if it was unintended), missing values (NA), and logical vectors often used for data subsetting.

Page Vectors and Type Coercion

The type of your data in R can be changed. Sometimes some other function you apply automatically changes the type internally, while the data object you supplied remains unaffected. For example, if x is a character object, lm(y ~ x) will treat x as a factor; x will remain the type character in the R environment. In other cases, to count the total or proportion of certain instances using a logical vector LV, you can apply sum(LV) or mean(LV) knowing that the logical values TRUE and FALSE will be treated as 1 and 0 by these functions. Please pay attention to these coercion rules.

Page Practice: Vectors

This exercise shows how easy it is to work with vectors in R and modify and reorganize the data in vectors. In the exercise, you will create and manipulate a vector, then save elements 5-10 of your vector as a new (separate) vector. This exercise does not count toward your grade. It is just for practice!

2.3: Arrays and Matrices Page What is the Difference Between Arrays and Matrices?

The video demonstrates the differences between matrices and arrays and how these objects' elements can be accessed or subsetted. 

Book Arrays in R

An array can be considered as a multiply subscripted collection of data entries. This section provides details on the construction and manipulation of arrays.

Book Matrices in R

This section provides details on the construction and manipulation of these objects, including matrix facilities that are different from typical element-wise operations.

Page Practice: Arrays and Matrices

These exercises test your knowledge of creating, accessing, and manipulating arrays and matrices. These exercises do not count toward your grade. They are just for practice!

2.4: Lists and Data Frames Page Lists and Data Frames

Lists are used to hold elements of different sizes and types, such as outputs of a regression model fit or results of a statistical test. However, if we restrict list elements to vectors of the same length, we can get a data.frame. The data.frame structure is in-between a matrix (data.frame has columns and rows and can be indexed as a matrix) and a list (each column in a matrix is a list element and can be indexed accordingly). The data.frame structure is convenient for holding typical spreadsheet data, where each column can be of a different type, for example, Date (Date class), Location (character type), and Temperature (numeric type).

Page Practice: Base-R Lists and Data Frames

We'll practice with the data frame format, which is the usual format for storing information on different variables. We'll practice the extensions of this format later. Use the cats data frame to solve these challenges

cats <- data.frame(coat = c("calico", "black", "tabby"),
                    weight = c(2.1, 5.0, 3.2),
                    likes_string = c(1, 0, 1))

Page The Tibble Format

The tibble format belongs to the family of packages "tidyverse" and attempts to make operations with data.frame-like structures more user-friendly. The tidyverse conveniently aggregates several popular packages, such as ggplot2 for plotting and dplyr for data manipulation. You can convert a data.frame to tibble and back if needed.

Page Practice: Tibbles

Try working with tibbles in these exercises. Remember, tibbles are just another form of the data.frame format. Do you find tibbles more convenient to use? This exercise does not count toward your grade. It is just for practice!

Book The data.table Format

The data.table format also helps shorten code when working with data.frame structures. Most importantly, data.table handles big data very efficiently. You can convert a data.frame to data.table and back if needed.

Page Practice: Data Tables

This is the final practice of data frame formats. Now you should be familiar with the three main ones: data.frame, data.table, and tibble. Repeat the practice tasks for tibbles now using the data.table format instead. This exercise does not count toward your grade. It is just for practice!

3.1: Data Input via Keyboard or Number Generation Book Entering Data

It can be a good idea to put down a few values directly in your code to create an object to try things on. First, you can use this new "synthetic" dataset to write more code while waiting for the real data. Second, you can use this dataset to debug your code (find the source of an error and fix it). When you complete this section, you will know several ways of creating data objects manually.

Page Data Sets in Base R

R already has a collection of datasets available to you. You can save some time by using these datasets instead of inputting example data manually. You will also notice that many example applications of R functions (given in the section Examples of the R function's help page or online such as on the StackExchange website) use these datasets for demonstrations. Moreover, some R packages supply additional datasets.

Page Practice: Built-in Datasets

In this short practice exercise, you will try using a dataset already loaded in R. It is convenient when you want to try things out on some data (of a certain structure) but do not have your data ready yet. 

Book Pseudo-Random Number Generation

The tools of random number generation are used for creating entirely new "synthetic" datasets and for permutation, subsampling, and bootstrapping (resampling with replacement) of existing data. You will learn how to use built-in R functions to generate random samples from different probability distributions (more distributions are available from user-contributed packages, such as the package gamlss.dist).

Page Practice: Random Number Generation

Here you will use functions for randomizing and subsampling things. The exercises also touch on the reproducibility of these random manipulations. Run the code from the following example on your computer. Were you able to obtain the same "random" numbers after the set.seed was implemented?

Page Reproducible Simulations

This video demonstrates the value and power of setting the seed for random number generation. Set seeds to make reproducible results of sampling, bootstrapping, etc., in your research. However, do not overuse this option (or at least be sure to use different seeds). Otherwise, there will be no randomness.

3.2: Loading External Files Page Data Loading and Viewing

This video shows the general approach to loading files: the content (the result of the loading) is assigned to some object, then you can view it in the RStudio viewer. Note that sorting the data in the viewer does not change the sorting order in the R object.

Page Base R: Reading Plain-Text Files

Loading plain-text files is a simple task. Files of this type are the best for sharing (for example, as a supplement to a publication) and long-term archiving of information. Pay attention to base-R options for skipping the lines, reading only a certain number of lines, and formatting strings – these options are often used by other packages too.

Page Tidyverse: Reading Plain-Text Files

Base-R functions are great, but if you prefer to use tidyverse packages and get a tibble upon loading the data, you might want to start with using the readr functions (readr is one of the packages in the tidyverse collection). Remember that these functions (and functions from the package data.table) also are faster than the base-R functions.

Page Practice: read_csv

These exercises check your understanding of file loading and some useful arguments for skipping or reading a certain number of rows. Keep these arguments in mind, as they come in handy when files have multi-line titles that are not part of the data. Complete these exercises to practice CSV loading. Also, try to load some files from your computer.

Page Parsing a Vector

When R encounters different formats of numbers (for example, numbers grouped by thousand like "150.300,00" vs. "150,300.00"), dates, etc., it tries to make the best guess and parse the inputs into a corresponding R representation. Here, you will learn how it is done in a series of vector examples.

Page Practice: Parsing a Vector

These exercises provide real-life examples of issues we can encounter when loading a file with different formats for the dates or decimal points. Complete these exercises to prepare yourself for those situations. This exercise does not count toward your grade. It is just for practice!

Page Parsing a File

Here we generalize our knowledge of parsing to parse a whole file. After loading your data, you can check the type of columns in different ways, such as by unfolding the object saved in the Environment and applying the functions str or summary.

Page Using the readxl Package to Read Excel Files

While CSV files can be loaded with the base-R functions or functions from other packages, special packages are required for loading Excel files. There are several alternatives (including the packages readxl, xlsx, openxlsx, and XLConnect), but we consider only readxl here because it belongs to the popular tidyverse group of packages and returns the already familiar tibble structure.

Book Loading Files From Other Programs

User-contributed packages provide tools for loading into R data saved in many other formats. Often several packages can load the same file format – you can find them by searching on the internet.

3.3: Data Export and Reusing R Data Page Saving and Reloading Data in R Format

Now you will learn how to save the data from your R session. This works for sharing the results with a friend who also uses R or for preserving the data for later reuse in R. Note the assignment operator is not used when an R image file is loaded.

Page Practice: Export and Reuse

Here is a short exercise to practice exporting and reusing data. This exercise does not count toward your grade. It is just for practice!

Page Base R: Writing to a CSV File

For long-term preservation of data and broader sharing (not just with the R users), it is better to save the data in a plain format like CSV. 

Here are the base-R functions to do that. You might find the option row.names = FALSE handy.

Page Tidyverse: Writing to a CSV File

The tidyverse also offers options for saving such files. Now you should be familiar with both options (base-R and tidyverse).

Page Practice: Export to a CSV File

Try this short exercise to practice exporting data in CSV and Excel format. This exercise does not count toward your grade. It is just for practice!

Page Practice: Data Manipulation in a Project

This exercise provides a short but complete code for the cycle of loading a dataset, saving, and reloading it in the R project environment that contains the folders "dataraw" and "dataderived".  This exercise does not count toward your grade. It is just for practice!

4.1: Base-R and ggplot2 Graphics Book Base-R Graphics

This section introduces the base-R graphics. Reading the materials will familiarize you with different options and commands used for plotting. You should start coding by implementing the high-level function like the plot, then incrementally modify and add code to change the plot appearance and add the function par to fine-tune the margins, etc. You will also learn about the R graphics devices used to save plots for publications (do not use the point-and-click interface to save plots from RStudio); these device commands are also applicable to outputs of the ggplot2.

Page Practice: Base-R Plots

In this short practice exercise, you will implement the high-level function plot. It is convenient for fast checks and does not require installing additional packages. This exercise does not count toward your grade. It is just for practice!

Book Introduction to ggplot

This section introduces the ggplot2 graphics. You will see how different the syntax is from the base-R graphics. You can think of ggplot2 creating graphs by combining layers with the "+" sign. The default gray background of the ggplot is not as good for printed publications and can be replaced by adding a theme layer, for example, + theme_minimal()

Key Points

  • Use ggplot2 to create plots.

  • Think about graphics in layers: aesthetics, geometry, statistics, scale transformation, and grouping.

Page Practice: ggplot

In this exercise, you can practice the implementation of ggplot and compare it to the base-R graphics. This exercise does not count toward your grade. It is just for practice!

4.2: Creating Histograms Page Introduction to Histograms

This video shows an interactive approach to creating histograms in base R, developing your code, and addressing the error messages. You will see more details on the available options in the next sections.

Book Histograms and Density Plots in ggplot2

Now you will learn the ggplot2 syntax for building and customizing histograms.

Page Histograms and Density Plots in base R

Here you will see more examples of how to build histograms in base R. Note that when the total counts for two or more samples are different, we can convert the vertical axis to density so the distributions can be easily compared on the same plot.

Page Practice: Histograms

In this exercise, you will practice plotting a histogram for a publication. This exercise does not count toward your grade. It is just for practice!

4.3: Creating Scatterplots Page Introduction to Scatterplots

This video demonstrates the steps to create and tailor a scatterplot in the base-R plotting system. Notice the incremental development of the code, adding elements to the plot and checking its view in the plot window. Finally, the code in the video also uses the png command to export the resulting plot for publication.

Book Scatterplots in Base R

Here we introduce scatterplots in base R. The codes are simple, but you should also remember the options that make the plots more informative, like adding colors, legends, and error bars.

Book Scatterplots in ggplot2

You will learn the layered syntax of ggplot2 for scatterplots in this section. It also demonstrates how regression lines can be added (compared with the base-R syntax shown in the introductory video).

Page Practice: Scatterplots

In this exercise, you practice producing scatterplots for a publication. This exercise does not count toward your grade. It is just for practice!

4.4: Creating Boxplots Page Introduction to Boxplots

This video shows how a boxplot is built. You should understand what each of the bars and whiskers means so you can interpret the boxplot.

Book Boxplots in Base R

This section introduces the functionality of the base-R function boxplot. Note that for some data formats, the plot function with x being a factor variable will also work.

Book Boxplots in ggplot2

In this section, you will learn the ggplot2 codes for producing boxplots. While the syntax and default appearance may differ, these plots aim to compare distributions and identify outliers. If you need, you can add a few lines of code to make the base-R and ggplot2 graphs look the same. The choice of which plotting system to use is yours now.

Page Practice: Boxplots

This quick practice exercise asks you to produce boxplots for a publication. This exercise does not count toward your grade. It is just for practice!

4.5: Creating Time Series Plots Page Time Series Plots in Base R

This section is a short introduction to time series plots in R. You can use the analogy with the scatterplots where the horizontal axis is time.

Page The ts Format

If you save the data in the special format ts, the plotting function plot.ts can produce a better-looking x-axis automatically. The ts format adds attributes to your data, such as the beginning and end times and frequency. This section shows how you can convert a usual vector to the ts format, then plot it.

Book Time Series Plots Using ggplot2

Of course, the ggplot2 can also visualize time series. This section introduces the relevant ggplot2 syntax.

Page Practice: Time Series Plots

In this exercise, you practice plotting a time series for a publication. This exercise does not count toward your grade. It is just for practice!

5.1: Single-Sample Summaries Page Basic Summary Statistics

After plotting the data, mean and variance are some of the basic summaries that we want to know. R has built-in functions to calculate mean, sd, var, and median. This video demonstrates the calculations and, using the plot, shows how the results relate to the sample data.

Page Examining the Distribution of a Dataset

Even one variable can tell a story. For example, sample data on personal incomes might show distinct clusters of high- and low-paid workers, and time series of average temperatures may show trends and seasonal cycles. Here you will learn R tools for working with such data by combining your experience with plots and simple statistical summaries.

Page Alternatives and Extensions

As you already know, for each base-R operation, there are user-contributed alternatives. This video demonstrates the function describe from the package psych, which outputs more statistics than the standard function summary. (You already know how to install and load the package to your R environment.) Be careful, as user-contributed packages might use the same names for their functions. For example, the package Hmisc also has a function describe that produces a different output.

Page Practice: Statistical Summary

Functions for individual quantities like mean or median are convenient when we want to use that specific number in further analysis or visualizations, but the function summary and its alternatives are great for exploratory analysis. In the exercise, you can practice both approaches. This exercise does not count toward your grade. It is just for practice!

Page Tables

Finally, the function table can count the number of observations per group. It is most useful when applied to factors, integers, logical values, or strings. It allows you to study group counts, proportions, and identify outliers. This section demonstrates the application of this function and how it can be applied to more than one variable.

5.2: The t-test Page One- and Two-Sample t-tests

The t-test is quite simple, and the base-R functionality will likely be sufficient for all your related calculations. This section introduces the plots and testing functions that help us to conduct the inference based on the t-test and its nonparametric alternative, the Wilcoxon (or Mann-Whitney) test.

Page Applying the t-test

Fortunately, the t-test calculations can be modified for the cases when the assumption of equal variances across groups is violated. In other words, Welch's version of the t-test accounts for unequal variances. This video demonstrates the test application in R and the relevant options for implementing it.

Page The Power of the t-test

The greater the difference between compared quantities and the more observations we have, the more confident we (the t-test) are that the observed differences are not just due to a random chance but are true, statistically significant differences. Even if the means of two populations are different, the t-test might not detect it if the difference or the sample is small. The probability with which the t-test would detect the difference under the given sample size and variability is the power of the test and can be calculated in R. We prefer high power and often use the desired power, confidence level, and expected variability to identify the required sample size.

Page Practice: t-test

In this exercise, you will use the t-test and Wilcoxon test to compare the Examination rates across the two groups. This exercise does not count toward your grade. It is just for practice!

5.3: One-Way ANOVA Page The Basics of One-Way ANOVA

This section introduces base-R functionality for the one-way ANOVA. "One-way" means that only one factor variable is used, such as in the case of BloodPressure ~ ExerciseLevel. Be aware that when two or more factors are used the contributed functions like car::Anova are preferred because they have the option to apply different types of the F-test and conduct inference without depending on the order the factors are introduced in the R formula.

Page ANOVA in afex and car

This video shows the implementation of ANOVA in the packages car and afex. It probably makes sense to start using one of these packages for ANOVA analysis instead of the base-R functions aov and anova, even if you have only one factor variable to start with. The video also covers a range of post-hoc tests used to find which groups the statistically significant differences occur between. These tests are useful, but the global ANOVA test is not needed for the analyst to start using these tests – just remember to use an adjustment for multiple testing.

Page Practice: ANOVA

In this practice exercise, you will use the built-in dataset iris to test whether the Sepal.Length differs by Species. This exercise does not count toward your grade. It is just for practice!

5.4: Linear Regression Page Model Basics

Models are simplified representations of reality based on available observations. Both the observations and our assumptions about the form of the existing relationships affect the model we get as an outcome of the analysis. Here you will learn the general approach for specifying and estimating a linear model in statistics.

Page Practice: Model Basics

While R makes the model fitting process extremely easy, several steps or implicit decisions go into it. For example, one may choose to keep or remove extreme observations (outliers) and select the optimization algorithm. This exercise demonstrates the effects of these decisions on the modeling outcomes.

Page Visualizing Models

One of the best tools to check the quality of a model is to plot things. This section shows how to visualize modeling results and the unmodeled remainder (residuals) to diagnose the model. Remember that residuals should not have any remaining pattern and should look randomly scattered. If there is a remaining pattern, try to include it in your model (that is, respecify the model), then reestimate the model and visualize the new residuals.

Page Practice: Visual Model Checks

You should get used to checking model quality visually. Look for inconsistencies between the data cloud pattern and the fitted lines for patterns in residuals and outlying observations. These exercises give examples and suggest R functions you can use for these tasks. This exercise does not count toward your grade. It is just for practice!

Book Formulas and Model Families

Formulas are the R versions of statistical equations passed to the R functions for estimation. We use formulas to specify the models, such as what terms the model will have and their transformations. This section introduces various options for specifying a model using formulas. Pay attention to specifying the intercept and interactions of variables.

Page Practice: Formulas

We often keep the intercept in the model even if it is not statistically significant because our main focus is usually on the effect of other variables expressed in their coefficients. However, there are cases when we need to remove the intercept to obtain the so-called "regression through the origin". Also, we might need to model the combined effect of two factors using the interaction term (for example, to model how light and water conditions affect plant growth). These exercises let you practice these cases and suggest you compare alternative models. It does not count toward your grade and is just practice!

Course Feedback Survey URL Course Feedback Survey