The data.table Format
The data.table format also helps shorten code when working with data.frame structures. Most importantly, data.table handles big data very efficiently. You can convert a data.frame to data.table and back if needed.
Data analysis using data.table
Data analysis using data.table
Data manipulation operations such as subset, group, update, join etc., are all inherently related. Keeping these related operations together allows for:
-
concise and consistent syntax irrespective of the set of operations you would like to perform to achieve your end goal.
-
performing analysis fluidly without the cognitive burden of having to map each operation to a particular function from a potentially huge set of functions available before performing the analysis.
-
automatically optimising operations internally, and very effectively, by knowing precisely the data required for each operation, leading to very fast and memory efficient code.
Briefly, if you are interested in reducing programming and compute time tremendously, then this package is for you. The philosophy that data.table
adheres to makes this possible. Our goal is to illustrate it through this series of vignettes.
Data
In this vignette, we will use NYC-flights14 data obtained by flights package (available on GitHub only). It contains On-Time flights data from the Bureau of Transporation Statistics for all the flights that departed from New York City airports in 2014 (inspired by nycflights13). The data is available only for Jan-Oct'14.
We can use data.table
's fast-and-friendly file reader fread
to load flights
directly as follows:
input <- if (file.exists("flights14.csv")) { "flights14.csv" } else { "https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv" } flights <- fread(input) flights # year month day dep_delay arr_delay carrier origin dest air_time distance hour # 1: 2014 1 1 14 13 AA JFK LAX 359 2475 9 # 2: 2014 1 1 -3 13 AA JFK LAX 363 2475 11 # 3: 2014 1 1 2 9 AA JFK LAX 351 2475 19 # 4: 2014 1 1 -8 -26 AA LGA PBI 157 1035 7 # 5: 2014 1 1 2 1 AA JFK LAX 350 2475 13 # --- # 253312: 2014 10 31 1 -30 UA LGA IAH 201 1416 14 # 253313: 2014 10 31 -5 -14 UA EWR IAH 189 1400 8 # 253314: 2014 10 31 -8 16 MQ LGA RDU 83 431 11 # 253315: 2014 10 31 -4 15 MQ LGA DTW 75 502 11 # 253316: 2014 10 31 -5 1 MQ LGA SDF 110 659 8 dim(flights) # [1] 253316 11
Aside: fread
accepts http
and https
URLs directly as well as operating system commands such as sed
and awk
output. See ?fread
for examples.
Source: M. Dowle , https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.