Overview of R

Read about R, its history, connections to other languages, and alternatives for statistical computing. You will also learn about various interfaces that can be used to edit and run R code, such as RStudio.

Features

Data Processing

R's data structures include vectors, arrays, lists, and data frames. Vectors are ordered collections of values and can be mapped to arrays of one or more dimensions in a column's major order. That is, given an ordered collection of dimensions, one fills in values along the first dimension first, then fills in one-dimensional arrays across the second dimension, and so on. R supports array arithmetics and, in this regard, is like languages such as APL and MATLAB. The special case of an array with two dimensions is called a matrix. Lists serve as collections of objects that do not necessarily have the same data type. Data frames contain a list of vectors of the same length plus a unique set of row names. R has no scalar data type. Instead, a scalar is represented as a length-one vector.

R and its libraries implement various statistical techniques, including linear and nonlinear modeling, classical statistical tests, spatial and time-series analysis, classification, clustering, and others. C, C++, and Fortran code can be linked and called at run time for computationally intensive tasks. Another of R's strengths is static graphics; it can produce publication-quality graphs that include mathematical symbols.


Programming

R is an interpreted language; users can access it through a command-line interpreter. If a user types 2+2 at the R command prompt and presses enter, the computer replies with 4.

R supports procedural programming with functions and, for some functions, object-oriented programming with generic functions. Due to its S heritage, R has stronger object-oriented programming facilities than most statistical computing languages. Extending it is facilitated by its lexical scoping rules derived from Scheme. R uses S-expressions to represent both data and code. R's extensible object system includes objects for (among others): regression models, time series, and geospatial coordinates. Advanced users can write C, C++, Java, .NET, or Python code to manipulate R objects directly.

Functions are first-class objects and can be manipulated similarly to data objects, facilitating meta-programming that allows multiple dispatches. Function arguments are passed by value and are lazy - that is to say, they are only evaluated when they are used, not when the function is called. A generic function acts differently depending on the classes of the arguments passed to it. In other words, the generic function dispatches the method implementation specific to that object's class. For example, R has a generic print function that can print almost every class of object in R with print(objectname). Many of R's standard functions are written in R, making it easy for users to follow the algorithmic choices. R is highly extensible through the use of packages for specific functions and specific applications.