Topic  Name  Description 

Course Syllabus  Course Syllabus  
1.1: R and Coding Environments  Overview of R  Read about R, its history, connections to other languages, and alternatives for statistical computing. You will also learn about various interfaces that can be used to edit and run R code, such as RStudio. 
Introduction to R and RStudio  Here is a short introduction to the basics of the RStudio layout. Now you should have a general understanding of what is R and RStudio, R links to other languages, and potential extension of functionality via usercontributed packages of functions. The next section will teach you how to install and set up this software. 

1.2: Installing and Setting Up R and RStudio  Installing R and RStudio  Follow the steps in the video to download and install R, then download and install RStudio Desktop. Both programs are completely free.
The first one is the "core," so it is better if you install it first. Then the interface (RStudio), while being installed, can easily find the R installation on your computer and connect to it. When you start RStudio, it shows the version of R on your computer it has connected to in the console. 
Setting up RStudio  This video will introduce you to the RStudio interface and its main settings. Implement those settings on your computer because they help you to foster good coding habits. By now, you should be comfortable setting up the R and RStudio coding environment on your computer. 

Updating Software  If needed, you can update R itself by downloading and installing a new version from the R Project website (by default, after restarting the RStudio, it will start using the newest version of R from those installed on your computer). RStudio can be updated, for example, by clicking Tools > Check for Updates. 

1.3: Command Line and Script  Using R as a Calculator  All complex analyses comprise sequences of multiple simple operations. To get acquainted with R and "build trust," please read about these operations you can implement in the R (RStudio) console and immediately see the results. Run the codes on your computer. Do you get the same outputs? Try changing some of the code to see how the outputs will change. You should be able to guess what kind of output you can get from any of these simple operations. 
Practice: Calculator  To start getting familiar with R, go ahead and try the examples on your machine. Do you always get the output you expect? Create some variables (see the section on Variable Assignment) and check that they appear in the RStudio Environment (panel in the top right). This exercise does not count toward your grade. It is just for practice! 

1.4: Functions and Packages  Functions  You have used a few functions already. Here is just a bit more formal introduction. You should be able to understand the inputs (arguments) you specify when calling a function and the output it returns. For most functions, you can get help by executing 
Practice: Functions  Here you will practice applying the function 

Packages  R functions come in packages like tools come in different toolboxes. You will use the function install.packages("PackageName") to get the needed package and library(PackageName) to load it in your R environment. After the package is installed, add a comment to the installation code like this
so your script doesn't reinstall the package each time but keeps it in
because it is needed each time you start a new R session. It is a good idea to keep all library() statements at the top of your script so you can easily see all packages needed and do not duplicate the library calls. 

Updating R and Its Packages  New versions of R are released every year. The best way to be uptodate with R is to check the CRAN website periodically. The website is updated for each new release and makes the release available for download. You'll have to install the new release, and the process is the same. 

Practice: Functions and Packages  As your code gets bigger, it might be hard to spot an error (a "bug" is what programmers call it politely). These examples let you practice debugging (finding and fixing errors in the code) for the code to run smoothly and correctly. Install R packages if needed for the examples to run. This exercise does not count toward your grade. It is just for practice! 

1.5: Management of Code and Other Files  R Projects and Files in a Project  First, watch the video, then read about the projects and R working directories. The video demonstrates one of the ways you can efficiently manage your files in a project. The discussed file structure will work in many cases but may need to be revised when large data are used and it is impossible or impractical to move the data to the local data folders. Also, the video assumes that each script file (for data loading, cleaning, plotting, etc.) is relatively large; hence it makes sense to keep the code separately so it is more manageable. Suppose each file (for data loading, cleaning, visualizing, and statistical analysis) contains just a few lines. In that case, it might be more practical to keep the codes together in a single script – you are free to decide based on the needs and size of your project. 
Practice: R Projects  Try this exercise to practice creating and using R projects. This exercise does not count toward your grade. It is just for practice! 

Best Practices for Writing R Code  It is helpful to keep your work tidy so you or other users can reuse the code later. Add many comments to your code to explain what and why you are doing or trying to achieve, as well as follow other recommendations in this section. 

2.1: Data Types  Basic Data Types and Data Structures in R  These are the data types we encounter in everyday work in R. You should learn about their differences and how to access their basic attributes. Most often, we need to know whether the data are in the correct format (such as numeric instead of character) and the size of the R object (use functions length(x) and dim(x) for that). 
Practice: Data Types  Now let's try a quick practice exercise creating variables. This exercise does not count toward your grade. It is just for practice! 

Strings  Although not all of us are linguists or text analysts, R functions for
operating with text strings are still useful. They will come in handy
when you need to match records in the data or select a portion of a
textual record (for example, only the first name and not the surname).
This section covers the basics of these operations. 

Practice: Strings  This practice aims to make you more familiar with the string format and operations we often need to match or subset strings while preparing data for analysis. Try to solve these string operations tasks. This exercise does not count toward your grade. It is just for practice! 

Factors  Factors are the way categorical variables are stored in R. For example,
treatment levels in ANOVA (analysis of variance) are considered factors;
months or quarters of the year can be represented as factors for
modeling seasonality. You should learn how to create factors, rename and
reorder factor levels for convenience, and correct analysis (for
example, the control treatment usually should be the first level of a
factor because, by default, other levels are compared to the first one
in linear models). 

Practice: Factors  In these exercises, you will practice operations with factors needed for implementing the analysis of variance (ANOVA) analysis or drawing a boxplot. Also, try applying the function table(X) to some factor X in your R environment – it is the function that quickly counts the number of occurrences of each element in X. These exercises do not count toward your grade. They are just for practice! 

2.2: Vectors  Vectors and Simple Manipulations  This section introduces the basic operations on vectors, most of which are done elementwise. Please pay attention to the recycling of vectors (usually, recycling doesn't generate an error or a warning, so it is easy to miss if it was unintended), missing values (NA), and logical vectors often used for data subsetting. 
Vectors and Type Coercion  The type of your data in R can be changed. Sometimes some other function you apply automatically changes the type internally, while the data object you supplied remains unaffected. For example, if x is a character object, lm(y ~ x) will treat x as a factor; x will remain the type character in the R environment. In other cases, to count the total or proportion of certain instances using a logical vector LV, you can apply sum(LV) or mean(LV) knowing that the logical values TRUE and FALSE will be treated as 1 and 0 by these functions. Please pay attention to these coercion rules. 

Practice: Vectors  This exercise shows how easy it is to work with vectors in R and modify and reorganize the data in vectors. In the exercise, you will create and manipulate a vector, then save elements 510 of your vector as a new (separate) vector. This exercise does not count toward your grade. It is just for practice! 

2.3: Arrays and Matrices  What is the Difference Between Arrays and Matrices?  The video demonstrates the differences between matrices and arrays and how these objects' elements can be accessed or subsetted. 
Arrays in R  An array can be considered as a multiply subscripted collection of data entries. This section provides details on the construction and manipulation of arrays. 

Matrices in R  This section provides details on the construction and manipulation of these objects, including matrix facilities that are different from typical elementwise operations. 

Practice: Arrays and Matrices  These exercises test your knowledge of creating, accessing, and manipulating arrays and matrices. These exercises do not count toward your grade. They are just for practice! 

2.4: Lists and Data Frames  Lists and Data Frames  Lists are used to hold elements of different sizes and types, such as outputs of a regression model fit or results of a statistical test. However, if we restrict list elements to vectors of the same length, we can get a data.frame. The data.frame structure is inbetween a matrix (data.frame has columns and rows and can be indexed as a matrix) and a list (each column in a matrix is a list element and can be indexed accordingly). The data.frame structure is convenient for holding typical spreadsheet data, where each column can be of a different type, for example, Date (Date class), Location (character type), and Temperature (numeric type). 
Practice: BaseR Lists and Data Frames  We'll practice with the data frame format, which is the usual format for storing information on different variables. We'll practice the extensions of this format later. Use the cats < data.frame(coat = c("calico", "black", "tabby"), 

The Tibble Format  The tibble format belongs to the family of packages "tidyverse" and attempts to make operations with data.framelike structures more userfriendly. The tidyverse conveniently aggregates several popular packages, such as ggplot2 for plotting and dplyr for data manipulation. You can convert a data.frame to tibble and back if needed. 

Practice: Tibbles  Try working with tibbles in these exercises. Remember, tibbles are just another form of the data.frame format. Do you find tibbles more convenient to use? This exercise does not count toward your grade. It is just for practice! 

The data.table Format  The data.table format also helps shorten code when working with data.frame structures. Most importantly, data.table handles big data very efficiently. You can convert a data.frame to data.table and back if needed. 

Practice: Data Tables  This is the final practice of data frame formats. Now you should be familiar with the three main ones: data.frame, data.table, and tibble. Repeat the practice tasks for tibbles now using the data.table format instead. This exercise does not count toward your grade. It is just for practice! 

3.1: Data Input via Keyboard or Number Generation  Entering Data  It can be a good idea to put down a few values directly in your code to create an object to try things on. First, you can use this new "synthetic" dataset to write more code while waiting for the real data. Second, you can use this dataset to debug your code (find the source of an error and fix it). When you complete this section, you will know several ways of creating data objects manually. 
Data Sets in Base R  R already has a collection of datasets available to you. You can save some time by using these datasets instead of inputting example data manually. You will also notice that many example applications of R functions (given in the section Examples of the R function's help page or online such as on the StackExchange website) use these datasets for demonstrations. Moreover, some R packages supply additional datasets. 

Practice: Builtin Datasets  In this short practice exercise, you will try using a dataset already loaded in R. It is convenient when you want to try things out on some data (of a certain structure) but do not have your data ready yet. 

PseudoRandom Number Generation  The tools of random number generation are used for creating entirely new "synthetic" datasets and for permutation, subsampling, and bootstrapping (resampling with replacement) of existing data. You will learn how to use builtin R functions to generate random samples from different probability distributions (more distributions are available from usercontributed packages, such as the package gamlss.dist). 

Practice: Random Number Generation  Here you will use functions for randomizing and subsampling things. The exercises also touch on the reproducibility of these random manipulations. Run the code from the following example on your computer. Were you able to obtain the same "random" numbers after the set.seed was implemented? 

Reproducible Simulations  This video demonstrates the value and power of setting the seed for random number generation. Set seeds to make reproducible results of sampling, bootstrapping, etc., in your research. However, do not overuse this option (or at least be sure to use different seeds). Otherwise, there will be no randomness. 

3.2: Loading External Files  Data Loading and Viewing  This video shows the general approach to loading files: the content (the result of the loading) is assigned to some object, then you can view it in the RStudio viewer. Note that sorting the data in the viewer does not change the sorting order in the R object. 
Base R: Reading PlainText Files  Loading plaintext files is a simple task. Files of this type are the best for sharing (for example, as a supplement to a publication) and longterm archiving of information. Pay attention to baseR options for skipping the lines, reading only a certain number of lines, and formatting strings – these options are often used by other packages too. 

Tidyverse: Reading PlainText Files  BaseR functions are great, but if you prefer to use tidyverse packages and get a tibble upon loading the data, you might want to start with using the readr functions (readr is one of the packages in the tidyverse collection). Remember that these functions (and functions from the package data.table) also are faster than the baseR functions. 

Practice: read_csv  These exercises check your understanding of file loading and some useful arguments for skipping or reading a certain number of rows. Keep these arguments in mind, as they come in handy when files have multiline titles that are not part of the data. Complete these exercises to practice CSV loading. Also, try to load some files from your computer. 

Parsing a Vector  When R encounters different formats of numbers (for example, numbers grouped by thousand like "150.300,00" vs. "150,300.00"), dates, etc., it tries to make the best guess and parse the inputs into a corresponding R representation. Here, you will learn how it is done in a series of vector examples. 

Practice: Parsing a Vector  These exercises provide reallife examples of issues we can encounter when loading a file with different formats for the dates or decimal points. Complete these exercises to prepare yourself for those situations. This exercise does not count toward your grade. It is just for practice! 

Parsing a File  Here we generalize our knowledge of parsing to parse a whole file. After loading your data, you can check the type of columns in different ways, such as by unfolding the object saved in the Environment and applying the functions str or summary. 

Using the readxl Package to Read Excel Files  While CSV files can be loaded with the baseR functions or functions from other packages, special packages are required for loading Excel files. There are several alternatives (including the packages readxl, xlsx, openxlsx, and XLConnect), but we consider only readxl here because it belongs to the popular tidyverse group of packages and returns the already familiar tibble structure. 

Loading Files From Other Programs  Usercontributed packages provide tools for loading into R data saved in many other formats. Often several packages can load the same file format – you can find them by searching on the internet. 

3.3: Data Export and Reusing R Data  Saving and Reloading Data in R Format  Now you will learn how to save the data from your R session. This works for sharing the results with a friend who also uses R or for preserving the data for later reuse in R. Note the assignment operator is not used when an R image file is loaded. 
Practice: Export and Reuse  Here is a short exercise to practice exporting and reusing data. This exercise does not count toward your grade. It is just for practice! 

Base R: Writing to a CSV File  For longterm preservation of data and broader sharing (not just with the R users), it is better to save the data in a plain format like CSV. Here are the baseR functions to do that. You might find the option row.names = FALSE handy. 

Tidyverse: Writing to a CSV File  The tidyverse also offers options for saving such files. Now you should be familiar with both options (baseR and tidyverse). 

Practice: Export to a CSV File  Try this short exercise to practice exporting data in CSV and Excel format. This exercise does not count toward your grade. It is just for practice! 

Practice: Data Manipulation in a Project  This exercise provides a short but complete code for the cycle of loading a dataset, saving, and reloading it in the R project environment that contains the folders "dataraw" and "dataderived". This exercise does not count toward your grade. It is just for practice! 

4.1: BaseR and ggplot2 Graphics  BaseR Graphics  This section introduces the baseR graphics. Reading the materials will familiarize you with different options and commands used for plotting. You should start coding by implementing the highlevel function like the plot, then incrementally modify and add code to change the plot appearance and add the function par to finetune the margins, etc. You will also learn about the R graphics devices used to save plots for publications (do not use the pointandclick interface to save plots from RStudio); these device commands are also applicable to outputs of the ggplot2. 
Practice: BaseR Plots  In this short practice exercise, you will implement the highlevel function plot. It is convenient for fast checks and does not require installing additional packages. This exercise does not count toward your grade. It is just for practice! 

Introduction to ggplot  This section introduces the Key Points


Practice: ggplot  In this exercise, you can practice the implementation of ggplot and compare it to the baseR graphics. This exercise does not count toward your grade. It is just for practice! 

4.2: Creating Histograms  Introduction to Histograms  This video shows an interactive approach to creating histograms in base R, developing your code, and addressing the error messages. You will see more details on the available options in the next sections. 
Histograms and Density Plots in ggplot2  Now you will learn the ggplot2 syntax for building and customizing histograms. 

Histograms and Density Plots in base R  Here you will see more examples of how to build histograms in base R. Note that when the total counts for two or more samples are different, we can convert the vertical axis to density so the distributions can be easily compared on the same plot. 

Practice: Histograms  In this exercise, you will practice plotting a histogram for a publication. This exercise does not count toward your grade. It is just for practice! 

4.3: Creating Scatterplots  Introduction to Scatterplots  This video demonstrates the steps to create and tailor a scatterplot in the baseR plotting system. Notice the incremental development of the code, adding elements to the plot and checking its view in the plot window. Finally, the code in the video also uses the png command to export the resulting plot for publication. 
Scatterplots in Base R  Here we introduce scatterplots in base R. The codes are simple, but you should also remember the options that make the plots more informative, like adding colors, legends, and error bars. 

Scatterplots in ggplot2  You will learn the layered syntax of ggplot2 for scatterplots in this section. It also demonstrates how regression lines can be added (compared with the baseR syntax shown in the introductory video). 

Practice: Scatterplots  In this exercise, you practice producing scatterplots for a publication. This exercise does not count toward your grade. It is just for practice! 

4.4: Creating Boxplots  Introduction to Boxplots  This video shows how a boxplot is built. You should understand what each of the bars and whiskers means so you can interpret the boxplot. 
Boxplots in Base R  This section introduces the functionality of the baseR function boxplot. Note that for some data formats, the plot function with x being a factor variable will also work. 

Boxplots in ggplot2  In this section, you will learn the ggplot2 codes for producing boxplots. While the syntax and default appearance may differ, these plots aim to compare distributions and identify outliers. If you need, you can add a few lines of code to make the baseR and ggplot2 graphs look the same. The choice of which plotting system to use is yours now. 

Practice: Boxplots  This quick practice exercise asks you to produce boxplots for a publication. This exercise does not count toward your grade. It is just for practice! 

4.5: Creating Time Series Plots  Time Series Plots in Base R  This section is a short introduction to time series plots in R. You can use the analogy with the scatterplots where the horizontal axis is time. 
The ts Format  If you save the data in the special format ts, the plotting function plot.ts can produce a betterlooking xaxis automatically. The ts format adds attributes to your data, such as the beginning and end times and frequency. This section shows how you can convert a usual vector to the ts format, then plot it. 

Time Series Plots Using ggplot2  Of course, the ggplot2 can also visualize time series. This section introduces the relevant ggplot2 syntax. 

Practice: Time Series Plots  In this exercise, you practice plotting a time series for a publication. This exercise does not count toward your grade. It is just for practice! 

5.1: SingleSample Summaries  Basic Summary Statistics  After plotting the data, mean and variance are some of the basic summaries that we want to know. R has builtin functions to calculate mean, sd, var, and median. This video demonstrates the calculations and, using the plot, shows how the results relate to the sample data. 
Examining the Distribution of a Dataset  Even one variable can tell a story. For example, sample data on personal incomes might show distinct clusters of high and lowpaid workers, and time series of average temperatures may show trends and seasonal cycles. Here you will learn R tools for working with such data by combining your experience with plots and simple statistical summaries. 

Alternatives and Extensions  As you already know, for each baseR operation, there are usercontributed alternatives. This video demonstrates the function describe from the package psych, which outputs more statistics than the standard function summary. (You already know how to install and load the package to your R environment.) Be careful, as usercontributed packages might use the same names for their functions. For example, the package Hmisc also has a function describe that produces a different output. 

Practice: Statistical Summary  Functions for individual quantities like mean or median are convenient when we want to use that specific number in further analysis or visualizations, but the function summary and its alternatives are great for exploratory analysis. In the exercise, you can practice both approaches. This exercise does not count toward your grade. It is just for practice! 

Tables  Finally, the function table can count the number of observations per group. It is most useful when applied to factors, integers, logical values, or strings. It allows you to study group counts, proportions, and identify outliers. This section demonstrates the application of this function and how it can be applied to more than one variable. 

5.2: The ttest  One and TwoSample ttests  The ttest is quite simple, and the baseR functionality will likely be sufficient for all your related calculations. This section introduces the plots and testing functions that help us to conduct the inference based on the ttest and its nonparametric alternative, the Wilcoxon (or MannWhitney) test. 
Applying the ttest  Fortunately, the ttest calculations can be modified for the cases when the assumption of equal variances across groups is violated. In other words, Welch's version of the ttest accounts for unequal variances. This video demonstrates the test application in R and the relevant options for implementing it. 

The Power of the ttest  The greater the difference between compared quantities and the more observations we have, the more confident we (the ttest) are that the observed differences are not just due to a random chance but are true, statistically significant differences. Even if the means of two populations are different, the ttest might not detect it if the difference or the sample is small. The probability with which the ttest would detect the difference under the given sample size and variability is the power of the test and can be calculated in R. We prefer high power and often use the desired power, confidence level, and expected variability to identify the required sample size. 

Practice: ttest  In this exercise, you will use the ttest and Wilcoxon test to compare the Examination rates across the two groups. This exercise does not count toward your grade. It is just for practice! 

5.3: OneWay ANOVA  The Basics of OneWay ANOVA  This section introduces baseR functionality for the oneway ANOVA. "Oneway" means that only one factor variable is used, such as in the case of BloodPressure ~ ExerciseLevel. Be aware that when two or more factors are used the contributed functions like car::Anova are preferred because they have the option to apply different types of the Ftest and conduct inference without depending on the order the factors are introduced in the R formula. 
ANOVA in afex and car  This video shows the implementation of ANOVA in the packages car and afex. It probably makes sense to start using one of these packages for ANOVA analysis instead of the baseR functions aov and anova, even if you have only one factor variable to start with. The video also covers a range of posthoc tests used to find which groups the statistically significant differences occur between. These tests are useful, but the global ANOVA test is not needed for the analyst to start using these tests – just remember to use an adjustment for multiple testing. 

Practice: ANOVA  In this practice exercise, you will use the builtin dataset iris to test whether the 

5.4: Linear Regression  Model Basics  Models are simplified representations of reality based on available observations. Both the observations and our assumptions about the form of the existing relationships affect the model we get as an outcome of the analysis. Here you will learn the general approach for specifying and estimating a linear model in statistics. 
Practice: Model Basics  While R makes the model fitting process extremely easy, several steps or implicit decisions go into it. For example, one may choose to keep or remove extreme observations (outliers) and select the optimization algorithm. This exercise demonstrates the effects of these decisions on the modeling outcomes. 

Visualizing Models  One of the best tools to check the quality of a model is to plot things. This section shows how to visualize modeling results and the unmodeled remainder (residuals) to diagnose the model. Remember that residuals should not have any remaining pattern and should look randomly scattered. If there is a remaining pattern, try to include it in your model (that is, respecify the model), then reestimate the model and visualize the new residuals. 

Practice: Visual Model Checks  You should get used to checking model quality visually. Look for inconsistencies between the data cloud pattern and the fitted lines for patterns in residuals and outlying observations. These exercises give examples and suggest R functions you can use for these tasks. This exercise does not count toward your grade. It is just for practice! 

Formulas and Model Families  Formulas are the R versions of statistical equations passed to the R functions for estimation. We use formulas to specify the models, such as what terms the model will have and their transformations. This section introduces various options for specifying a model using formulas. Pay attention to specifying the intercept and interactions of variables. 

Practice: Formulas  We often keep the intercept in the model even if it is not statistically significant because our main focus is usually on the effect of other variables expressed in their coefficients. However, there are cases when we need to remove the intercept to obtain the socalled "regression through the origin". Also, we might need to model the combined effect of two factors using the interaction term (for example, to model how light and water conditions affect plant growth). These exercises let you practice these cases and suggest you compare alternative models. It does not count toward your grade and is just practice! 

Course Feedback Survey  Course Feedback Survey 