Topic outline

  • Unit 1: Introduction to R and RStudio

    R is a language for statistical computing and free, open-source software provided by The R Foundation for Statistical Computing. The software comes with a free R editor, which is an interface that allows accessing R functionality and writing and executing R code. However, many other code editors and integrated development environments (IDEs), both free and commercial, extend the standard editor functionality and are often more convenient. In this unit, we start exploring R and RStudio IDE and introduce basic practices for coding and organizing your files.

    Completing this unit should take you approximately 2 hours.

    • Upon successful completion of this unit, you will be able to:

      • describe R, RStudio, its structure, and potential applications;
      • setup R and RStudio coding environment;
      • save and execute R code using different approaches;
      • describe common practices of coding in R and its file and data management; and
      • find and set up additional R packages to expand R functionality.
    • 1.1: R and Coding Environments

      The first thing is to understand what R is and how to interact with this language once installed on your computer. In this section, you will also learn a brief history of R and its potential applications.

      • Read about R, its history, connections to other languages, and alternatives for statistical computing. You will also learn about various interfaces that can be used to edit and run R code, such as RStudio.

      • Here is a short introduction to the basics of the RStudio layout. Now you should have a general understanding of what is R and RStudio, R links to other languages, and potential extension of functionality via user-contributed packages of functions. The next section will teach you how to install and set up this software.

    • 1.2: Installing and Setting Up R and RStudio

      The best way to master coding is by practice. This section explains how to install the R software and RStudio integrated development environment that provides a convenient interface for interacting with R and a variety of developer tools.

      • Follow the steps in the video to download and install R, then download and install RStudio Desktop. Both programs are completely free.

        The first one is the "core," so it is better if you install it first. Then the interface (RStudio), while being installed, can easily find the R installation on your computer and connect to it. When you start RStudio, it shows the version of R on your computer it has connected to in the console.

      • This video will introduce you to the RStudio interface and its main settings. Implement those settings on your computer because they help you to foster good coding habits. By now, you should be comfortable setting up the R and RStudio coding environment on your computer.

      • If needed, you can update R itself by downloading and installing a new version from the R Project website (by default, after restarting the RStudio, it will start using the newest version of R from those installed on your computer). RStudio can be updated, for example, by clicking Tools > Check for Updates.

    • 1.3: Command Line and Script

      The two most common ways to execute a command in R are: 1) type it directly in the R console or 2) write the command in a plain-text file (called "script"; R scripts usually have the file extension *.R) that can be saved, modified, and executed. Working in the console directly is appropriate when you do not need to save or modify your steps. For example, you may quickly type and run in the console ?functionName to open a help file for the function or dim(Object) to check the dimensions of the Object in your R environment before you start applying a certain function to that Object. However, you should save most other steps you execute in R in the script, and then you can improve, share, and reproduce your results.

      • All complex analyses comprise sequences of multiple simple operations. To get acquainted with R and "build trust," please read about these operations you can implement in the R (RStudio) console and immediately see the results. Run the codes on your computer. Do you get the same outputs? Try changing some of the code to see how the outputs will change. You should be able to guess what kind of output you can get from any of these simple operations.

      • To start getting familiar with R, go ahead and try the examples on your machine. Do you always get the output you expect? Create some variables (see the section on Variable Assignment) and check that they appear in the RStudio Environment (panel in the top right). This exercise does not count toward your grade. It is just for practice!

    • 1.4: Functions and Packages

      Functions are the workhorses of your analysis, data manipulation, or visualization in R. You are going to use R functions all the time. Here you will learn about function arguments and how to get help for a specific function and load more functions in the form of a package.

      • You have used a few functions already. Here is just a bit more formal introduction. You should be able to understand the inputs (arguments) you specify when calling a function and the output it returns. For most functions, you can get help by executing ?function_name in the console.

      • Here you will practice applying the function lm, which is one of the frequently-used base-R functions. Please open the help file for lm by executing the function (?function_name) you learned previously in this section. Read the help file to understand which arguments are required for the function to run and which are optional. Execute the examples given at the end of the help file. We will return to this function when introducing statistical models later in the course.

      • R functions come in packages like tools come in different toolboxes. You will use the function install.packages("PackageName") to get the needed package and library(PackageName) to load it in your R environment. After the package is installed, add a comment to the installation code like this

        # install.packages("PackageName")

        so your script doesn't reinstall the package each time but keeps it in

        library(PackageName)

         because it is needed each time you start a new R session. It is a good idea to keep all library() statements at the top of your script so you can easily see all packages needed and do not duplicate the library calls.

      • New versions of R are released every year. The best way to be up-to-date with R is to check the CRAN website periodically. The website is updated for each new release and makes the release available for download. You'll have to install the new release, and the process is the same.

      • As your code gets bigger, it might be hard to spot an error (a "bug" is what programmers call it politely). These examples let you practice debugging (finding and fixing errors in the code) for the code to run smoothly and correctly. Install R packages if needed for the examples to run. This exercise does not count toward your grade. It is just for practice!

    • 1.5: Management of Code and Other Files

      This section suggests ways to optimize your work on a project. You will enjoy coding more after you develop habits of saving your analysis as R scripts, adding sections and sufficient comments to your code, and organizing project files for efficient work with the data, analysis outputs, code files, and other related documents.

      • First, watch the video, then read about the projects and R working directories. The video demonstrates one of the ways you can efficiently manage your files in a project. The discussed file structure will work in many cases but may need to be revised when large data are used and it is impossible or impractical to move the data to the local data folders. Also, the video assumes that each script file (for data loading, cleaning, plotting, etc.) is relatively large; hence it makes sense to keep the code separately so it is more manageable. Suppose each file (for data loading, cleaning, visualizing, and statistical analysis) contains just a few lines. In that case, it might be more practical to keep the codes together in a single script – you are free to decide based on the needs and size of your project.

      • Try this exercise to practice creating and using R projects. This exercise does not count toward your grade. It is just for practice!

      • It is helpful to keep your work tidy so you or other users can reuse the code later. Add many comments to your code to explain what and why you are doing or trying to achieve, as well as follow other recommendations in this section.

    • Unit 1 Assessment

      • Take this assessment to see how well you understood this unit.

        • This assessment does not count towards your grade. It is just for practice!
        • You will see the correct answers when you submit your answers. Use this to help you study for the final exam!
        • You can take this assessment as many times as you want, whenever you want.