Naming columns and rows

It is often convenient to name the columns and rows within a dataset to keep things clearly organized.  This is very easy to do in R, here’s how.  First let’s make some data.

a <- c(62.3, 55.3, 65.3, 59.3, 67.3)
b <- c(2.2, 5.4, 1.3, 2.8, 5.4)
c <- c(0.1, 1.5, 1.6, 2.1, 0.3)
data <- cbind(a, b, c)
data
##         a   b   c
## [1,] 62.3 2.2 0.1
## [2,] 55.3 5.4 1.5
## [3,] 65.3 1.3 1.6
## [4,] 59.3 2.8 2.1
## [5,] 67.3 5.4 0.3

We can see that this dataset “data” already has the column names “a”, “b”, and “c” stored for me when I used the function cbind. ***For more information on cbind and creating data, check out my other post here***.

Now let’s update the column names to something meaningful using the function colnames. For this function we have to define the data we are working on and then the names of the columns.
colnames(data name here) <- c(“column name 1”, “column name 2”, “etc”)

colnames(data) <- c("temp_F", "wind_m/s", "precip_in")
data
##      temp_F wind_m/s precip_in
## [1,]   62.3      2.2       0.1
## [2,]   55.3      5.4       1.5
## [3,]   65.3      1.3       1.6
## [4,]   59.3      2.8       2.1
## [5,]   67.3      5.4       0.3

Now let’s say that we want to update the row names to be meaningful as well. The function is you guessed it rownames and works the same way as colnames. rownames(data name here) <- c(“row name 1”, “row name 2”, “etc”)

rownames(data) <- c("Site 1", "Site 2", "Site 3", "Site 4", "Site 5")
data
##        temp_F wind_m/s precip_in
## Site 1   62.3      2.2       0.1
## Site 2   55.3      5.4       1.5
## Site 3   65.3      1.3       1.6
## Site 4   59.3      2.8       2.1
## Site 5   67.3      5.4       0.3

Let’s try to break rownames quickly by adding too many row names (6 row names instead of 5). We’ll find that R will get angry and let us know with a semi-cryptic error message.

rownames(data) <- c("Site 1", "Site 2", "Site 3", "Site 4", "Site 5", "Site 6")
## Error in `rownames<-`(`*tmp*`, value = c("Site 1", "Site 2", "Site 3",  : 
## length of 'dimnames' [1] not equal to array extent

If you make a data frame, your data will already reflect the column headers as well. Remember from last time that we can make a data frame using the function data.frame. Here’s an example:

numbers <- c(1, 2, 3, 4, 5)
letters <- c("a", "b", "c", "d", "e")
symbols <- c("!", "@", "#", "$", "%")
data2 <- data.frame(numbers, letters, symbols)
data2
##   numbers letters symbols
## 1       1       a       !
## 2       2       b       @
## 3       3       c       #
## 4       4       d       $
## 5       5       e       %

We can see that our new data frame already has the column names “numbers”, “letters”, and “symbols” from the way we build our dataset.

Advertisements

cbind part 2

In an earlier post we discussed creating data using the functions seq, rep, and then merging them together with cbind. **You can see that post here**. In this post we’re going to go more in depth on the limitations of cbind.

If we make some data of say temp and precip, we can combine using cbind:

temp <- c(52.5, 53.7, 55.7, 57.2, 55.9, 57.3, 60.3)
precip <-c (0.11, 0.22, 0.05, 0.0, 0.0, 0.0, 0.0)
data <- cbind(temp, precip)
data
##      temp precip
## [1,] 52.5   0.11
## [2,] 53.7   0.22
## [3,] 55.7   0.05
## [4,] 57.2   0.00
## [5,] 55.9   0.00
## [6,] 57.3   0.00
## [7,] 60.3   0.00

This worked just fine. The temperature and precipitation data are now stored together in a matrix. We know it’s a matrix because we can query the object type using the functions typeof and class. The “double” means the data is numeric with double precision and the “matrix” means that the data class is a matrix.

typeof(data)
class(data)
## [1] "double"
## [1] "matrix"

cbind works great you are working with the same data types. It does not work well when the data types are different. For instance if data “a” is numeric and data “b” is text, cbind has unexpected results. See for yourself:

a <- c(1, 2, 3, 4, 5)
b <- c("a", "b", "c", "d", "e")
class(a)
class(b)
## [1] "numeric"
## [1] "character"
data2 <-cbind(a, b)
data2
##      a   b  
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"
## [4,] "4" "d"
## [5,] "5" "e"
typeof(data2)
class(data2)
## [1] "character"
## [1] "matrix"

In the results shown above we can see that when we put numeric data together with text data using cbind, the function turns all of the data to text (character). This is not what we want!!! If we try and operate on the data it doesn’t work. We’ll try multiplying the matrix by 2.

2*data2
Error in 2 * data2 : non-numeric argument to binary operator

This is a feature of cbind to be aware of. So what’s the work around you ask? Data frames. Using the same data as above, we’ll store vectors “a” and “b” together in a data frame and preserve their integrity. The function is easy to remember, it’s data.frame.

newdata <- data.frame(a, b)
newdata
##   a b
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
typeof(newdata)
class(newdata)
## [1] "list"
## [1] "data.frame"

More importantly though, if we query the data within the data frame, we’ll see that column “a” is numeric and column “b” is text. Remember we can reference columns and rows within a matrix and data frame using square brackets after the name (data[row, column]). Using data[,2] references all of the rows of data within the second column.

typeof(newdata[,1])
class(newdata[,1])
## [1] "double"
## [1] "numeric"
typeof(newdata[,2])
class(newdata[,2])
## [1] "integer"
## [1] "factor"

We can see from the results that column 1 (“a”) is numeric and column 2 (“b”) is a factor (character). We’ll get more into factors at another time, but it’s safe to say that our data is in tact. We can test this to be certain by operating on each column in our new data frame “newdata”.

2*newdata[,1]
## [1]  2  4  6  8 10

We can operate on column 1 (“a”) which we would expect since they are numeric values.

2*newdata[,2]
## Warning in Ops.factor(2, newdata[, 2]): '*' not meaningful for factors
## [1] NA NA NA NA NA

We cannot operate on column 2 (“b), which we also would expect, since they are text values.