cbind part 2

In an earlier post we discussed creating data using the functions seq, rep, and then merging them together with cbind. **You can see that post here**. In this post we’re going to go more in depth on the limitations of cbind.

If we make some data of say temp and precip, we can combine using cbind:

temp <- c(52.5, 53.7, 55.7, 57.2, 55.9, 57.3, 60.3)
precip <-c (0.11, 0.22, 0.05, 0.0, 0.0, 0.0, 0.0)
data <- cbind(temp, precip)
##      temp precip
## [1,] 52.5   0.11
## [2,] 53.7   0.22
## [3,] 55.7   0.05
## [4,] 57.2   0.00
## [5,] 55.9   0.00
## [6,] 57.3   0.00
## [7,] 60.3   0.00

This worked just fine. The temperature and precipitation data are now stored together in a matrix. We know it’s a matrix because we can query the object type using the functions typeof and class. The “double” means the data is numeric with double precision and the “matrix” means that the data class is a matrix.

## [1] "double"
## [1] "matrix"

cbind works great you are working with the same data types. It does not work well when the data types are different. For instance if data “a” is numeric and data “b” is text, cbind has unexpected results. See for yourself:

a <- c(1, 2, 3, 4, 5)
b <- c("a", "b", "c", "d", "e")
## [1] "numeric"
## [1] "character"
data2 <-cbind(a, b)
##      a   b  
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"
## [4,] "4" "d"
## [5,] "5" "e"
## [1] "character"
## [1] "matrix"

In the results shown above we can see that when we put numeric data together with text data using cbind, the function turns all of the data to text (character). This is not what we want!!! If we try and operate on the data it doesn’t work. We’ll try multiplying the matrix by 2.

Error in 2 * data2 : non-numeric argument to binary operator

This is a feature of cbind to be aware of. So what’s the work around you ask? Data frames. Using the same data as above, we’ll store vectors “a” and “b” together in a data frame and preserve their integrity. The function is easy to remember, it’s data.frame.

newdata <- data.frame(a, b)
##   a b
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## [1] "list"
## [1] "data.frame"

More importantly though, if we query the data within the data frame, we’ll see that column “a” is numeric and column “b” is text. Remember we can reference columns and rows within a matrix and data frame using square brackets after the name (data[row, column]). Using data[,2] references all of the rows of data within the second column.

## [1] "double"
## [1] "numeric"
## [1] "integer"
## [1] "factor"

We can see from the results that column 1 (“a”) is numeric and column 2 (“b”) is a factor (character). We’ll get more into factors at another time, but it’s safe to say that our data is in tact. We can test this to be certain by operating on each column in our new data frame “newdata”.

## [1]  2  4  6  8 10

We can operate on column 1 (“a”) which we would expect since they are numeric values.

## Warning in Ops.factor(2, newdata[, 2]): '*' not meaningful for factors
## [1] NA NA NA NA NA

We cannot operate on column 2 (“b), which we also would expect, since they are text values.

One thought on “cbind part 2

  1. Pingback: rep, seq, and cbind: Data Creation and Processing | The Practical R

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s