Table of Contents
Importing data into R
Importing data from spreadsheet (*.txt, *.csv, Excel file)
This is a common situation - you have data in some spreadsheet software (eg Excel in Windows, or Numbers in macOS), and you want to upload the data into R. There are several ways how to do it. The strongly recommended way is to save the datasheet from Excel as *.txt (preferably) or *.csv file (less preferably, since the format of CSV files depends on the language setting of your system), store the file in a chosen folder, and load the data into R from this file. You can also copy the data spreadsheet from Excel to the clipboard and load it to R via the clipboard, however, this is not a reproducible way, since you cannot code it (and next time you or others do, it may simply load a different chunk of data). Or, you may upload data directly from the Excel file stored on your computer (this may or may not work, depending on the version of Excel and R you are using). If data are available online (e.g. shared as a Dropbox/One Drive link, or stored on some website), you can import them directly from the online source, or (if data can be made public) create their online copy on GitHub eg as Gist (check options here for how to make it).
When you load the file from your computer using some of the functions below, there are two ways how to do it:
- together with the name of the file, include also the absolute path to the file on your computer, using either forward slash (
/), or double backslash (
\\) to separate the folder names (e.g.
- change the R working directory to the folder in which the file is stored by
setwd (“c:/path/to/data/folder)function, and then, in the function used to load the data, use only the name of the file without an absolute path (e.g.
Example data. The data we will use here as an example (
data-for-import.* files) look like this (note that the first column are row names, and the first row is header = column names):
Using *.txt (tab delimited) format
Use the file data-for-import.txt, which is plain text, with cells separated by tabs (Tab keys, tabulators, coded as
\t). Save the file into a folder, and then specify the address to the folder in the
file argument, for example (if the file is saved to folder
imported_data_1 <- read.delim (file = "c:/path/to/data/folder/data-for-import.txt", row.names = 1)
imported_data_1 is a data frame. Note that the function
read.delim is derived from more general function
read.table by setting some of the
read.table arguments to different default values (
sep = '\t', header = TRUE, meaning that the function expects the data have values separated by tab characters (
\t) and the first row of the data will become header = names of variables). If the first column of the data should become row names (as in this case), include the argument
row.names = 1.
file can also be a URL link to the location of the file on the Internet:
imported_data_1 <- read.delim ("https://raw.githubusercontent.com/zdealveindy/recol/main/data/data-for-import.txt", row.names = 1)
Alternative option is to use
read_delim function from the package
readr (part of
imported_data_1a <- readr::read_delim ("https://raw.githubusercontent.com/zdealveindy/recol/main/data/data-for-import.txt", row.names = 1)
Note that there is no default setting for the argument
read_delim function, you need to always specify (here tabulator,
delim = “\t”). Using
readr package has it's specialities: imported data are not
data.frame, but they are
tibbles (a special data frame format used within tidiverse). There are number of differences, one important is that by default
tibble does not have row names. The
tibble can be converted into standard data frames using
as.data.frame function, but if the first column of the
tibble is row name, it needs to be converted into
imported_data_1b <- as.data.frame (imported_data_1a[,-1], row.names = imported_data_1a[1,])
as.data.frame is applied on the tibble with the first column removed, and the first column is supplied into the argument
Note: different countries may have different specifications of what decimal separator is used; for English (and Chinese) setting the decimal separator is a dot (
.), while for some other countries (Germany, France, French, among others) it may be a comma (
,). This behaviour can be changed in the Regional Setting of your computer. If your setting is using comma as decimal separator, then *.txt file saved from Excel will most likely also contain commas. However, the
read.delim (and also
readr::read_delim) function by default expects the separator to be dot (default setting of argument
dec = ”.“). A simple solution is to use
read.delim or other functions with argument
dec = ”,“; in this way, the comma will be interpreted as decimal separator. Values with decimal commas imported into R without changing
dec into comma will be imported as factors (you can easily check that by applying
summary function on the imported data frame: if the column is numeric, the summary will return the basic statistics (mean, median, min and max), while if the column is a factor, it will return a list of individual factor values)
Using *.csv (comma separated values) format
Use the file data-for-import.csv, in which cells are separated by commas (,) and decimal signs are represented by dots (.). Note that *.csv format is a tricky one, since it follows different standards in different countries and on different platforms - so while in Taiwan, western Europe and elsewhere the cells are delimited by commas (,) and decimals are separated by dots (.), in France, Czech and elsewhere the *.csv format has cells delimited by semicolons (;) and decimals by commas (,).
Alternatively, you can use general function
read.table with setting appropriate arguments:
head = T means that the first row of data is taken as header (column names),
sep = ”,“ sets the delimiter to be semicolon, and
dec = ”.“ sets decimal separator to be a dot (this is default of
read.table function, not necessary to change).
If your *.csv file is following “semicolon/comma” standard instead of “comma/dot” one, you may use the function
read.csv2, which expects separators to be semicolon (;) and decimals commas (,).
readr::read_delim function with argument
delim = ”,“ can be used (check Using *.txt (tab delimited) format for the specialities around
read_delim function and
readr package in general).
Use the file data-for-import.xlsx, the sheet pig, and copy the table into the clipboard (take care to copy really just cells with data, not empty cells around - use of CTRL+A is thus not recommended). Then use:
imported-data_3 <- read.delim (file = 'clipboard', row.names = 1)
read.delim expects that the input (in this case the one stored in clipboard) is a plain text delimited by tabulators.
Using the clipboard is a quick and dirty way of importing data into R; it is relatively fast, but a considerable drawback is that it is not reproducible - you cannot code it, instead, you need to always manually open the file, copy the table to the clipboard and upload into R (this is prone to errors if you want somebody to replicate the same thing).
Import directly from Excel (*.xls or *.xlsx) file
Importing data directly from Excel used to be quite complicated (see e.g. this website), but the package
readxl made it much easier. Still, I do not suggest you use this option, since it may not be replicable on every platform, and it may change with a newer version of Excel. Alternatively, if you need to do this often (manipulate data manually in a spreadsheet and upload them into R, or even export from R back to the spreadsheet), consider using Google Sheets solution instead (see e.g. here).
Use of the package
readxl is pretty straightforward. There is a suite of functions for reading Excel file, like read_excel, read_xls and read_xlsx. Important arguments are
sheet, first for the name of the file (optinally including the path to the folder) and the second the name of the sheet which should be imported. We may try it on the Excel file data-for-import.xlsx - save it somewhere on your computer, and use:
# install.packages ('readxl') library (readxl) imported_data_4 <- read_excel ('c:/path/to/data/folder/data-for-import.xlsx', sheet = 'pig')
imported_data_4 created in R is not data.frame, but
tibble (alternative to data.frame in the
tidyverse packages). If you don't like that (or you don't know how to use it), you can simply convert it into data.frame using
as.data.frame function. If the first column of the data are in fact row names, they may need to be assigned as such:
imported_data_4a <- as.data.frame (imported_data_4[,-1], row.names = imported_data_4[,1])
Note that the functions from package
readxl cannot read Excel files directly from internet, unlike e.g.
read.table. But there is a workaround - first, download the Excel file into R as a temporary file, and the read it using
# install.packages ('readxl') library (readxl) url <- 'https://github.com/zdealveindy/recol/blob/main/data/data-for-import.xlsx?raw=true' destfile <- tempfile () download.file (url, destfile, mode = 'wb') imported_data_5 <- read_excel(destfile, sheet = 'pig')
Note that in the function
download.file it is important to specify the argument
mode = 'wb' (on Windows, if the argument
mode is not set up, the type of the file will be determined from the file extension; in case of “xls” and “xlsx”, it would attempt to download these files as plain text, but in fact, these files need to be downloaded as binaries!).
readxl is a part of the
tidyverse packages, and as such it does not use standard
data.frame format for data frames, but unique
class (imported_data_5)). One feature of
tibble is that it does not have rownames, and rownames are therefore imported as the first column of the data frame. This may or may not be handy for future analysis. To convert
tibble into standard
data.frame, use the function
as.data.frame. Then, still, you need to move first column into the rownames of the newly created data frame. Alternatively, there are functions like
column_to_rownames in the package
tibbles which can help you with that.
imported_data_5_df <- as.data.frame (imported_data_5) rownames (imported_data_5_df) <- imported_data_5_df[,1] imported_data_5_df <- imported_data_5_df[,-1]
Import *.RData file
Binary data storing the R object can be loaded into R using function
load. Download data data-for-import.rdata to your computer and use:
This should create the variable
data_for_import in your Global environment (it will appear among variables). You do not need to assign the result to a new variable (nothing like
imported_values <- load ('data-for-import.RData') will work).
load function can read directly data from a website, if the URL link is wrapped by the function
And what are the data in the example data-for-import file?
It's actually really a pig!
load (url ('https://github.com/zdealveindy/recol/blob/main/data/data-for-import.rdata?raw=true')) plot (data_for_import)
The pig! (note that the color I used to fill the polygon here is called Pig Pink :)
plot (data_for_import, type = 'o', pch = 16, col = 'red', frame.plot = F, axes = F) polygon (data_for_import, pch = 16, col = '#FCD7DE99', border = NA)