A couple weeks ago I stumbled across a feature in R that I had never heard of before. The functions save(), load(), and the R file type .rda.
The .rda files allow a user to save their R data structures such as vectors, matrices, and data frames. The file is automatically compressed, with user options for additional compression. Let’s take a look.
First, we will grab one of the built-in R datasets. We can view these by calling data(). Let’s use the “Orange” dataset.
# get the Orange data
Orange
Tree age circumference
1 1 118 30
2 1 484 58
3 1 664 87
4 1 1004 115
5 1 1231 120
6 1 1372 142
7 1 1582 145
8 2 118 33
9 2 484 69
10 2 664 111
11 2 1004 156
12 2 1231 172
13 2 1372 203
14 2 1582 203
15 3 118 30
16 3 484 51
17 3 664 75
18 3 1004 108
19 3 1231 115
20 3 1372 139
21 3 1582 140
22 4 118 32
23 4 484 62
24 4 664 112
25 4 1004 167
26 4 1231 179
27 4 1372 209
28 4 1582 214
29 5 118 30
30 5 484 49
31 5 664 81
32 5 1004 125
33 5 1231 142
34 5 1372 174
35 5 1582 177
Next, let’s save each column individually as vectors.
# save the Orange data as vectors
count<-Orange$Tree
age<-Orange$age
circumference<-Orange$circumference
Now if we look at our variables in the RStudio environment, we can see count, age, and circumference saved there.
Next, let’s set our R working directory, so the .rda file will save in the correct location. First we’ll use getwd() to find our current working directory, then we’ll adjust it (if needed) using setwd(). I set my working directory to a folder on the D drive.
#get and set working directory
getwd()
[1] "D:/Users"
setwd("D:/r-temp")
> getwd()
[1] "D:/r-temp"
Finally, let’s use the save() command to save our 3 vectors to an .rda file. The “file” name will be the name of the new .rda file.
#save to rda file
save(count, age, circumference, file = "mydata.rda")
Next we will remove our R environment variables using the command rm().
#remove variables
rm(age, circumference, count)
Now we can see that we no longer have saved variables in our R workspace.
Now, we can check that our .rda file (myrda.rda) does in fact store our data by using the load() command.
Note: If we had not properly set our working directory, then we would have needed to provide a full path to the rda file. For example, “C:/Users/Documents/R files/myrda” rather than just “myrda”.
#load the rda file
load(file = "mydata.rda")
Great, now we can see that our variables are back in the R environment for use once more.
Saving and loading data in R might be very useful when you’re working with large datasets that you want to clear from your memory, but you also would like to save for later. It also might be useful for long, complex R workflows and scripts. You can control the compression of the file using the settings ‘compress’ and ‘compression_level’.
That’s all for now!
You might prefer to use `saveRDS` and `readRDS`. They come with the same compression options, but work more like standard R – without side effects. `load` is *weird* that the object(s) pop up in your workspace without assignment, with the same names that they had before. `readRDS` works like other data import functions in that you assign the result. This is nice because you might want a new name, `yesterday_data <- readRDS(…)` (and you probably don't want to overwrite any objects that happen to have the same names as the objects in your `rda` file). The only weakness of `saveRDS` compared to `save` is that it only works on one object.
LikeLiked by 2 people
I second Gregor Thomas’ views. RDS is a genuine proprietory R solution to save and read data fast and using little space. The weakness he describes I consider being a strength, namely that it only works on one object.
Little known, one can attach a comment to a data.frame with comment(), for example
comment(DF) <- "mmddyy\nfreed of all missing values"
After reading the data comment(DF) will spit out the content.
Should you also use Python or time is critical you might have a look at the {feather} package by Hadley Wickham.
LikeLike