Load, Save, and .rda files

A couple weeks ago I stumbled across a feature in R that I had never heard of before. The functions save(), load(), and the R file type .rda.

The .rda files allow a user to save their R data structures such as vectors, matrices, and data frames. The file is automatically compressed, with user options for additional compression. Let’s take a look.

First, we will grab one of the built-in R datasets. We can view these by calling data(). Let’s use the “Orange” dataset.

# get the Orange data
Orange
   Tree  age circumference
1     1  118            30
2     1  484            58
3     1  664            87
4     1 1004           115
5     1 1231           120
6     1 1372           142
7     1 1582           145
8     2  118            33
9     2  484            69
10    2  664           111
11    2 1004           156
12    2 1231           172
13    2 1372           203
14    2 1582           203
15    3  118            30
16    3  484            51
17    3  664            75
18    3 1004           108
19    3 1231           115
20    3 1372           139
21    3 1582           140
22    4  118            32
23    4  484            62
24    4  664           112
25    4 1004           167
26    4 1231           179
27    4 1372           209
28    4 1582           214
29    5  118            30
30    5  484            49
31    5  664            81
32    5 1004           125
33    5 1231           142
34    5 1372           174
35    5 1582           177

Next, let’s save each column individually as vectors.

# save the Orange data as vectors
count<-Orange$Tree
age<-Orange$age
circumference<-Orange$circumference

Now if we look at our variables in the RStudio environment, we can see count, age, and circumference saved there.
saved_files

Next, let’s set our R working directory, so the .rda file will save in the correct location. First we’ll use getwd() to find our current working directory, then we’ll adjust it (if needed) using setwd(). I set my working directory to a folder on the D drive.

#get and set working directory
getwd()
[1] "D:/Users"
setwd("D:/r-temp")
> getwd()
[1] "D:/r-temp"

Finally, let’s use the save() command to save our 3 vectors to an .rda file. The “file” name will be the name of the new .rda file.

#save to rda file
save(count, age, circumference, file = "mydata.rda")

Next we will remove our R environment variables using the command rm().

#remove variables
rm(age, circumference, count)

Now we can see that we no longer have saved variables in our R workspace.
no-saved-files

Now, we can check that our .rda file (myrda.rda) does in fact store our data by using the load() command.
Note: If we had not properly set our working directory, then we would have needed to provide a full path to the rda file. For example, “C:/Users/Documents/R files/myrda” rather than just “myrda”.

#load the rda file
load(file = "mydata.rda")

Great, now we can see that our variables are back in the R environment for use once more.

saved_files

Saving and loading data in R might be very useful when you’re working with large datasets that you want to clear from your memory, but you also would like to save for later. It also might be useful for long, complex R workflows and scripts. You can control the compression of the file using the settings ‘compress’ and ‘compression_level’.

That’s all for now!

2 thoughts on “Load, Save, and .rda files

  1. You might prefer to use `saveRDS` and `readRDS`. They come with the same compression options, but work more like standard R – without side effects. `load` is *weird* that the object(s) pop up in your workspace without assignment, with the same names that they had before. `readRDS` works like other data import functions in that you assign the result. This is nice because you might want a new name, `yesterday_data <- readRDS(…)` (and you probably don't want to overwrite any objects that happen to have the same names as the objects in your `rda` file). The only weakness of `saveRDS` compared to `save` is that it only works on one object.

    Liked by 2 people

  2. I second Gregor Thomas’ views. RDS is a genuine proprietory R solution to save and read data fast and using little space. The weakness he describes I consider being a strength, namely that it only works on one object.
    Little known, one can attach a comment to a data.frame with comment(), for example
    comment(DF) <- "mmddyy\nfreed of all missing values"
    After reading the data comment(DF) will spit out the content.
    Should you also use Python or time is critical you might have a look at the {feather} package by Hadley Wickham.

    Like

Leave a comment