Importing Data into R, part II

I recently downloaded the latest version of R Studio and noticed that their import dataset functionality had changed significantly. I had previously written about importing data HERE and wanted to provide an update for the current version of RStudio.

When you go to import data using R Studio, you get a menu like this.

rstudio-old-import

If you’re using the latest version of RStudio, when you click “From CSV” you’ll get a popup about downloading a new library ‘readr’.

readr

Once that has completed, you’ll see the new import data window (shown below).

new-import-screen

Okay, so first let’s make a simple comma delimited data file so we can test out the new import dataset process. I have made a simple file called “x-y-data.txt” as shown below. If you make this same file (no spaces, just a comma to separate the x column from the y column) then we can do this exercise together.

x-y-data

Now, let’s use the RStudio import to bring in the file “x-y-data.txt”. Here’s a screen grab of the import screen with my x-y dataset.

import-data

We can see that RStudio has used the first row as names, has recognized that it is a comma delimited file, and has read both x and y values as integers. Everything looks good, so I click “import”.

It was after this import process, that I had tried running some of my standard functions, such as making an empirical CDF (cumulative density function) and then I ran into problems. So let’s check the type of data we have imported.

# get the data structure
typeof(x_y_data)
#[1] "list"
class(x_y_data)
#[1] "tbl_df"     "tbl"        "data.frame"

While the old RStudio would have imported this as a matrix by default, this latest version of RStudio imports data as a data frame by default. Apparently RStudio has created their own version of a data frame called a “tbl_df” or tibble data frame. When you use the ‘readr’ package, your data is imported automatically as a “tbl_df”.

Now this isn’t necessarily a bad thing, in fact it seems like there is some nice functionality gained by using the “tbl_df” format. This change just broke some of my previously written code and it’s good to know what RStudio is doing by default.

If we wanted to get back to the matrix format, we can do this will a simple as.matrix function. From there we can verify it was converted using the typeof and class functions.

# convert to a matrix
data<-as.matrix(x_y_data)
#     x  y
#[1,] 1  2
#[2,] 2  4
#[3,] 3  6
#[4,] 4  8
#[5,] 5 10

typeof(data)
#[1] "integer"
class(data)
#[1] "matrix"

You can read more about the new Tibble structure at these websites:

https://blog.rstudio.org/2016/03/24/tibble-1-0-0/

http://www.sthda.com/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data

Enjoy!

Advertisements

3 thoughts on “Importing Data into R, part II

  1. You describe RStudio as doing things that are in fact done by packages. It is important to know the difference and can be confusing to people. It isn’t RStudio that read the CSV file as tbl-df rather than data frame, it was the package readr.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s