The data.table Format

The data.table format also helps shorten code when working with data.frame structures. Most importantly, data.table handles big data very efficiently. You can convert a data.frame to data.table and back if needed.

Data analysis using data.table

Data analysis using data.table

Data manipulation operations such as subset, group, update, join etc., are all inherently related. Keeping these related operations together allows for:

  • concise and consistent syntax irrespective of the set of operations you would like to perform to achieve your end goal.

  • performing analysis fluidly without the cognitive burden of having to map each operation to a particular function from a potentially huge set of functions available before performing the analysis.

  • automatically optimising operations internally, and very effectively, by knowing precisely the data required for each operation, leading to very fast and memory efficient code.

Briefly, if you are interested in reducing programming and compute time tremendously, then this package is for you. The philosophy that data.table adheres to makes this possible. Our goal is to illustrate it through this series of vignettes.


Data

In this vignette, we will use NYC-flights14 data obtained by flights package (available on GitHub only). It contains On-Time flights data from the Bureau of Transporation Statistics for all the flights that departed from New York City airports in 2014 (inspired by nycflights13). The data is available only for Jan-Oct'14.

We can use data.table's fast-and-friendly file reader fread to load flights directly as follows:

input <- if (file.exists("flights14.csv")) { 
   "flights14.csv" 
} else { 
  "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv" 
} 
flights <- fread(input) 
flights 
#         year month day dep_delay arr_delay carrier origin dest air_time distance hour 
#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 
#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11 
#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19 
#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7 
#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13 
#     ---                                                                               
# 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14 
# 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8 
# 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11 
# 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11 
# 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8 
dim(flights) 
# [1] 253316     11 

Aside: fread accepts http and https URLs directly as well as operating system commands such as sed and awk output. See ?fread for examples.


Source: M. Dowle , https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.