Loading Files From Other Programs

Site: Saylor Academy
Course: PRDV420: Introduction to R Programming
Book: Loading Files From Other Programs
Printed by: Guest user
Date: Friday, May 17, 2024, 2:04 AM

Description

User-contributed packages provide tools for loading into R data saved in many other formats. Often several packages can load the same file format – you can find them by searching on the internet.

Loading Files from Other Programs

You should follow the same advice I gave you for Excel files whenever you wish to work with file formats native to other programs: open the file in the original program and export the data as a plain-text file, usually a CSV. This will ensure the most faithful transcription of the data in the file, and it will usually give you the most options for customizing how the data is transcribed.

Sometimes, however, you may acquire a file but not the program it came from. As a result, you won't be able to open the file in its native program and export it as a text file. In this case, you can use one of the functions in Table D.4 to open the file. These functions mostly come in R's foreign package. Each attempts to read in a different file format with as few hiccups as possible.

Table D.4: A number of functions will attempt to read the file types of other data-analysis programs

File format Function Library
ERSI ArcGIS read.shapefile shapefiles
Matlab readMat R.matlab
minitab read.mtp foreign
SAS (permanent data set) read.ssd foreign
SAS (XPORT format) read.xport foreign
SPSS read.spss foreign
Stata read.dta foreign
Systat read.systat foreign


Connecting to Databases

You can also use R to connect to a database and read in data.

Use the RODBC package to connect to databases through an ODBC connection.

Use the DBI package to connect to databases through individual drivers. The DBI package provides a common syntax for working with different databases. You will have to download a database-specific package to use in conjunction with DBI. These packages provide the API for the native drivers of different database programs. For MySQL use RMySQL, for SQLite use RSQLite, for Oracle use ROracle, for PostgreSQL use RPostgreSQL, and for databases that use drivers based on the Java Database Connectivity (JDBC) API use RJDBC. Once you have loaded the appropriate driver package, you can use the commands provided by DBI to access your database.


Source: G. Grolemund, https://rstudio-education.github.io/hopr/dataio.html
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Other types of data

To get other types of data into R, we recommend starting with the tidyverse packages listed below. They're certainly not perfect, but they are a good place to start. For rectangular data:

  • haven reads SPSS, Stata, and SAS files.

  • readxl reads excel files (both .xls and .xlsx).

  • DBI, along with a database specific backend (e.g. RMySQL, RSQLite, RPostgreSQL etc) allows you to run SQL queries against a database and return a data frame.

For hierarchical data: use jsonlite (by Jeroen Ooms) for json, and xml2 for XML. Jenny Bryan has some excellent worked examples at https://jennybc.github.io/purrr-tutorial/.

For other file types, try the R data import/export manual and the rio package.


Source: H. Wickham and G. Grolemund, https://r4ds.had.co.nz/data-import.html
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.