A couple weeks ago I stumbled across a feature in R that I had never heard of before. The functions save(), load(), and the R file type .rda.
The .rda files allow a user to save their R data structures such as vectors, matrices, and data frames. The file is automatically compressed, with user options for additional compression. Let’s take a look.
First, we will grab one of the built-in R datasets. We can view these by calling data(). Let’s use the “Orange” dataset.
# get the Orange data
Orange
Tree age circumference
1 1 118 30
2 1 484 58
3 1 664 87
4 1 1004 115
5 1 1231 120
6 1 1372 142
7 1 1582 145
8 2 118 33
9 2 484 69
10 2 664 111
11 2 1004 156
12 2 1231 172
13 2 1372 203
14 2 1582 203
15 3 118 30
16 3 484 51
17 3 664 75
18 3 1004 108
19 3 1231 115
20 3 1372 139
21 3 1582 140
22 4 118 32
23 4 484 62
24 4 664 112
25 4 1004 167
26 4 1231 179
27 4 1372 209
28 4 1582 214
29 5 118 30
30 5 484 49
31 5 664 81
32 5 1004 125
33 5 1231 142
34 5 1372 174
35 5 1582 177
Next, let’s save each column individually as vectors.
# save the Orange data as vectors
count<-Orange$Tree
age<-Orange$age
circumference<-Orange$circumference
Now if we look at our variables in the RStudio environment, we can see count, age, and circumference saved there.
Next, let’s set our R working directory, so the .rda file will save in the correct location. First we’ll use getwd() to find our current working directory, then we’ll adjust it (if needed) using setwd(). I set my working directory to a folder on the D drive.
#get and set working directory
getwd()
[1] "D:/Users"
setwd("D:/r-temp")
> getwd()
[1] "D:/r-temp"
Finally, let’s use the save() command to save our 3 vectors to an .rda file. The “file” name will be the name of the new .rda file.
#save to rda file
save(count, age, circumference, file = "mydata.rda")
Next we will remove our R environment variables using the command rm().
#remove variables
rm(age, circumference, count)
Now we can see that we no longer have saved variables in our R workspace.
Now, we can check that our .rda file (myrda.rda) does in fact store our data by using the load() command.
Note: If we had not properly set our working directory, then we would have needed to provide a full path to the rda file. For example, “C:/Users/Documents/R files/myrda” rather than just “myrda”.
#load the rda file
load(file = "mydata.rda")
Great, now we can see that our variables are back in the R environment for use once more.
Saving and loading data in R might be very useful when you’re working with large datasets that you want to clear from your memory, but you also would like to save for later. It also might be useful for long, complex R workflows and scripts. You can control the compression of the file using the settings ‘compress’ and ‘compression_level’.
That’s all for now!