Load, Save, and .rda files

A couple weeks ago I stumbled across a feature in R that I had never heard of before. The functions save(), load(), and the R file type .rda.

The .rda files allow a user to save their R data structures such as vectors, matrices, and data frames. The file is automatically compressed, with user options for additional compression. Let’s take a look.

First, we will grab one of the built-in R datasets. We can view these by calling data(). Let’s use the “Orange” dataset.

# get the Orange data
Orange
   Tree  age circumference
1     1  118            30
2     1  484            58
3     1  664            87
4     1 1004           115
5     1 1231           120
6     1 1372           142
7     1 1582           145
8     2  118            33
9     2  484            69
10    2  664           111
11    2 1004           156
12    2 1231           172
13    2 1372           203
14    2 1582           203
15    3  118            30
16    3  484            51
17    3  664            75
18    3 1004           108
19    3 1231           115
20    3 1372           139
21    3 1582           140
22    4  118            32
23    4  484            62
24    4  664           112
25    4 1004           167
26    4 1231           179
27    4 1372           209
28    4 1582           214
29    5  118            30
30    5  484            49
31    5  664            81
32    5 1004           125
33    5 1231           142
34    5 1372           174
35    5 1582           177

Next, let’s save each column individually as vectors.

# save the Orange data as vectors
count<-Orange$Tree
age<-Orange$age
circumference<-Orange$circumference

Now if we look at our variables in the RStudio environment, we can see count, age, and circumference saved there.
saved_files

Next, let’s set our R working directory, so the .rda file will save in the correct location. First we’ll use getwd() to find our current working directory, then we’ll adjust it (if needed) using setwd(). I set my working directory to a folder on the D drive.

#get and set working directory
getwd()
[1] "D:/Users"
setwd("D:/r-temp")
> getwd()
[1] "D:/r-temp"

Finally, let’s use the save() command to save our 3 vectors to an .rda file. The “file” name will be the name of the new .rda file.

#save to rda file
save(count, age, circumference, file = "mydata.rda")

Next we will remove our R environment variables using the command rm().

#remove variables
rm(age, circumference, count)

Now we can see that we no longer have saved variables in our R workspace.
no-saved-files

Now, we can check that our .rda file (myrda.rda) does in fact store our data by using the load() command.
Note: If we had not properly set our working directory, then we would have needed to provide a full path to the rda file. For example, “C:/Users/Documents/R files/myrda” rather than just “myrda”.

#load the rda file
load(file = "mydata.rda")

Great, now we can see that our variables are back in the R environment for use once more.

saved_files

Saving and loading data in R might be very useful when you’re working with large datasets that you want to clear from your memory, but you also would like to save for later. It also might be useful for long, complex R workflows and scripts. You can control the compression of the file using the settings ‘compress’ and ‘compression_level’.

That’s all for now!

Importing Data into R, part II

I recently downloaded the latest version of R Studio and noticed that their import dataset functionality had changed significantly. I had previously written about importing data HERE and wanted to provide an update for the current version of RStudio.

When you go to import data using R Studio, you get a menu like this.

rstudio-old-import

If you’re using the latest version of RStudio, when you click “From CSV” you’ll get a popup about downloading a new library ‘readr’.

readr

Once that has completed, you’ll see the new import data window (shown below).

new-import-screen

Okay, so first let’s make a simple comma delimited data file so we can test out the new import dataset process. I have made a simple file called “x-y-data.txt” as shown below. If you make this same file (no spaces, just a comma to separate the x column from the y column) then we can do this exercise together.

x-y-data

Now, let’s use the RStudio import to bring in the file “x-y-data.txt”. Here’s a screen grab of the import screen with my x-y dataset.

import-data

We can see that RStudio has used the first row as names, has recognized that it is a comma delimited file, and has read both x and y values as integers. Everything looks good, so I click “import”.

It was after this import process, that I had tried running some of my standard functions, such as making an empirical CDF (cumulative density function) and then I ran into problems. So let’s check the type of data we have imported.

# get the data structure
typeof(x_y_data)
#[1] "list"
class(x_y_data)
#[1] "tbl_df"     "tbl"        "data.frame"

While the old RStudio would have imported this as a matrix by default, this latest version of RStudio imports data as a data frame by default. Apparently RStudio has created their own version of a data frame called a “tbl_df” or tibble data frame. When you use the ‘readr’ package, your data is imported automatically as a “tbl_df”.

Now this isn’t necessarily a bad thing, in fact it seems like there is some nice functionality gained by using the “tbl_df” format. This change just broke some of my previously written code and it’s good to know what RStudio is doing by default.

If we wanted to get back to the matrix format, we can do this will a simple as.matrix function. From there we can verify it was converted using the typeof and class functions.

# convert to a matrix
data<-as.matrix(x_y_data)
#     x  y
#[1,] 1  2
#[2,] 2  4
#[3,] 3  6
#[4,] 4  8
#[5,] 5 10

typeof(data)
#[1] "integer"
class(data)
#[1] "matrix"

You can read more about the new Tibble structure at these websites:

https://blog.rstudio.org/2016/03/24/tibble-1-0-0/

http://www.sthda.com/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data

Enjoy!

Importing Data into R

One of the most important features we need to be able to do in R is import existing data, whether it be .txt files, .csv files, or even .xls (Excel files). If we can’t import data into R, then we can’t do anything. Okay let’s get started.

The spirit of this blog is that whatever I do here, should also work for someone working from home. Thus, we all need to work from the same text file, so we have to build a simple text file together to make this whole process work. Since this isn’t R code, I’ll just break it down into simple steps that are easy to follow.

Step 1: open notepad
Step 2: enter data as I have shown below (no spaces, use only commas)
rain
Step 3: save the file as ‘rain.txt’ on your Desktop

Okay, great now it’s time to get to work in R importing this data. We have two options, importing using the R Studio environment (the easy way), or importing using standard R functions.

The Easy way (Import through R Studio)

Step 1: Click the ‘Import Dataset’ button, then click ‘From Local File’
import_r
import_r2

Step 2: Navigate to the ‘rain.txt’ file located on your Desktop and click ‘open’. The next dialog box we get shows the values contained within our file, and different importing options. A few things to notice, ‘Name’ at the top has been set to “rain”, which will become the variable our data is stored as in R. The ‘Heading’ radio button has already been moved to ‘yes’ because R Studio has recognized our column headers (month, rain_mm, flow_cms). Additionally, the ‘Separator’ has been adjusted to ‘comma’ as we have made a comma delimited text file. All you have to do is just click ‘Import’.

import-dataset

Step 3: R Studio automatically opens the ‘rain’ dataset as a table in a new tab. R Studio also provides the snippet of code it used to import the data, which is great! You can copy that code and paste it into your R script file for future use.
rain_data
rstudio_code

That’s it! You’re a pro at importing data using R Studio.

 The Hard way (Import using R functions)

There’s lots of functions that can be used to import data into R: read.table, read.csv, read.csv2, read.delim, read.delim2 (among others). We’ll use read.table in this example.

To understand how this function works, let’s open up the R help by typing ?read.table.

# Get R help
?read.table

That should open up a help file through your web browser, or in the lower right ‘Help’ menu if you’re using R Studio. The main pieces of this function we need to set are ‘file’, ‘header’, and ‘sep’.
-The ‘file’ piece is the file name and file path we want to import.
-The ‘header’ piece is set to TRUE or FALSE based on whether or not there is a header within the file.
-The ‘sep’ piece describes the separator used within the file (in our case, a comma)

So, here’s how our code should look:

# Import the data
rain<-read.table("C:/Users/YOUR-NAME/Desktop/rain.txt", header = TRUE,
                   sep = ",")

Two things of note in the code above.
-The part of the path “YOUR-NAME” is based on your computer login settings. It might be something like ‘Tim’, ‘Jane’, ‘PeterC’, etc.
-I have defined this data as “rain” in R, using the rain <- bit of code. Here’s how the data looks in R.

# The rain data
#   month rain_mm flow_cms
#1      1     128    15000
#2      2      98    12000
#3      3      92    11000
#4      4      77     9800
#5      5      68     7600
#6      6      63     5800
#7      7      76     5500
#8      8      81     5700
#9      9      84     6200
#10    10     122     9500
#11    11     117    15000
#12    12     125     1700

That’s it! You’re a pro at importing data into R via the hard way! Happy coding!

 

Multiplication (and R data types)

This is a basic post about multiplication operations in R. We’re considering element-wise multiplication versus matrix multiplication. First let’s make some data:

# Make some data
a = c(1,2,3)
b = c(2,4,6)
c = cbind(a,b)
x = c(2,2,2)

If we look at the output (c and x), we can see that c is a 3×2 matrix and x is a 1×3 matrix (which I will also call a vector).

# View our data
c
##      a b
## [1,] 1 2
## [2,] 2 4
## [3,] 3 6
x
## [1] 2 2 2

In R the asterisk (*) is used for element-wise multiplication. This is where the elements in the same row are multiplied by one another.

#These will give the same result
c*x
x*c

We can see that the output of c*x and x*c are the same, and the vector x doubles matrix c.

#View our element-wise multiplication output
##      a  b
## [1,] 2  4
## [2,] 4  8
## [3,] 6 12

##      a  b
## [1,] 2  4
## [2,] 4  8
## [3,] 6 12

In R percent signs combined with asterisks are used for matrix multiplication (%*%).

# This works (matrix multiplication)
x%*%c
##       a  b
## [1,] 12 24

If you dig back and remember your matrix multiplication, you’ll find that a 1×3 matrix times a 3×2 matrix gives a 1×2 matrix. It will have the same number of rows as the first matrix (x has 1 row) and the same number of columns as the second matrix (c has 2 columns). Now let’s try this with x and c reversed.

# This doesn't work. Incorrect dimensions.
c%*%x
## Error in c %*% x : non-conformable arguments

R gives us an error because you can’t multiply a 3×2 and 1×3 matrix. For the matrix multiplication to work, the number of columns in the first matrix (c = 3 columns) has to be equal to the number of rows in the second matrix (x= 1 row).

The previous operations were done using the default R arrays, which are matrices. We can confirm this using the command class and typeof below:

# Get the data type
class(c)
typeof(c)
class(x)
typeof(x)

Here’s the output of those functions.

# The output
## [1] "matrix"
## [1] "double"
## [1] "numeric"
## [1] "double"

This shows us that our matrix c, has the R data type of a matrix, with formatting of ‘double’, which means that is is numbers (as opposed to something like ‘character’). This also shows us our 1×3 matrix or vector has the R data type ‘numeric’ and also has the formatting of ‘double’.

Now, let’s say your data is in a data frame rather than a matrix. Let’s see what happens when we perform multiplication on data frames. Remember data frames in R can hold different types of data (numbers, letters, etc.), while matrices can only have one type of data.
***For more info about this see my post here titled CBIND2***
Let’s convert our matrices to data frames using the function data.frame.

c1 = data.frame(c)
x1 = data.frame(x)

Now let’s look at our data. Note that there is an extra column of numbers from 1 to 3 for both c1 and x1. This is just a feature of the data frame output in R, where it is counting the rows 1 through 3.

c1
##   a b
## 1 1 2
## 2 2 4
## 3 3 6

x1
##   x
## 1 2
## 2 2
## 3 2

And just to be thorough, let’s check the R data type, to make sure they are not matrices.

# Check the data type
class(c1)
typeof(c1)
class(x1)
typeof(x1)

Here’s the output of those the data type. Notice that the class is now ‘data.frame’ instead of ‘matrix’ or ‘numeric’.

# The output
## [1] "data.frame"
## [1] "list"
## [1] "data.frame"
## [1] "list"

Now let’s try our simple element-wise multiplication again. You may have guessed it already, but these functions will no longer work.

# These both do not work
c1*x1
x1*c1

Here’s the output of the multiplication (i.e., the errors R provides).

## Error in Ops.data.frame(c1, x1) : 
##   ‘*’ only defined for equally-sized data frames

## Error in Ops.data.frame(c1, x1) : 
##   ‘*’ only defined for equally-sized data frames

According to the error R is providing, we can only multiply data frames of the same size. So, let’s try this out by making some new data.

# Make some data
h=c(2,2)
k=c(4,4)
j=cbind(h,k)
l=j*2

df1 = data.frame(j)
df2 = data.frame(l)

Now let’s look at the data to see what we have

# View the new data frames
df1
##   h k
## 1 2 4
## 2 2 4

df2
##   h k
## 1 4 8
## 2 4 8

Finally, let’s multiply df1*df2 and see what happens.

# Data frame multiplication
df1*df2
##   h  k
## 1 8 32
## 2 8 32

R has done element-wise multiplication on the data frames. This makes sense since we use only the (*) command. If we try this again with the order of the data frames reversed, we will get the same answer.

# Reverse the order for multiplication
df2*df1
##   h  k
## 1 8 32
## 2 8 32

That’s all for now. Hopefully this shed more light onto the way R performs multiplication, especially based on the data type.

How to make a Function in R

This post is meant to show R users how to make their own functions. We’ll start with an easy example below.

Most of my posts provide R code that can be easily copied into R and replicated at home. This post will be a break from that process since functions require saving *.R files and calling them from other *.R files. Let’s begin.

First of all make a new R script file. This will become our function file. There is no difference between a script file and a function file in R. Both are *.R files.

We will make a simple function that multiplies a vector of data by 2. We start by defining our function using the

#make a function
my_function<- function(x){
  x*2
  
}

Now save this R file as “f_myfirstfunction.R” on your Desktop. Now let’s walk through the components of the function. We defined it as “my_function”. This is important as it is how we call the function. After that it’s the

Now we have to open a second R file. This will be the script file that we will use to call the function from. We’ll start this file by setting our working directory to the desktop with the functions getwd() and setwd. getwd() simply states your current working directory in R. setwd is used to change it to wherever you like.

#set the working directory
#rename "your User Name here" based on your user name
#example: owner, Emily, Bill
getwd()
setwd("C:/Users/your User Name here/Desktop")
currwd 

If you get the error, “Error in setwd(“C:/Users/your User Name here/Desktop”):
cannot change the working directory” that means you misspelled some part of your file path. Fix the error and run the code again.

Now we need to make a vector of data, so let’s use the function seq which makes a sequence of values. We’ll save our vector as “data”.

#make some data
data<- seq(from=1, to=10, by=1)
data

Next we have to import the function that we made into the R working space. This is very easy once we have set the working directory. Simply use the call source.

#import the function
source("f_myfirstfunction.R")

I should point out that you need quotations around the R file name. Also, if this file is not saved on the Desktop (the location we set the working directory to), this will give an error “Error in file (….. cannot open the connection”. If this happens move your function file “f_myfirstfunction.R” to your working directory.

Now we will use our awesome new function that we made to multiply the vector “data” by 2. Of course we could just code data*2, but that’s not the point. We’re learning how to write a function.

#call the function
my_function(data)

Awesome! You ran your first function! The R console will spit out the answer:
[1] 2 4 6 8 10 12 14 16 18 20
If we wanted to do something more useful with this output we should save it as a variable. Let’s use data2.

#call the function
data2 <- my_function(data)

This time we get no output from R, but if we type in the variable data2 we get our familiar output:
[1] 2 4 6 8 10 12 14 16 18 20

One important thing to remember when using functions in R is that it doesn’t matter what you save you function file as. When you call your function, you’re using the defined name within the function file code.

#rename the function call to 'times2'
times2<- function(x){
  x*2
}
#rename the function again
zzzzz<- function(x){
  x*2
}

This is the same function saved in file “f_myfirstfunction.R”, but the function name has been changed. Again the function name is what is called from R.

I’ve listed the full text of the script file “call function.R” and the function file “f_myfirstfunction.R” below.

Hope this helps! Happy function writing!

#"call function.R"
#set the working directory
getwd()
setwd("D:/D Documents/wordpress/practicalR/make a function")
currwd

#make some data
data<- seq(from=1, to=10, by=1)
data

#import the function
source("f_myfirstfunction.R")

#call the function
my_function(data)

#call the function - save output as variable
data2 <- my_function(data)
 
#"f_myfirstfunction.R"
my_function<- function(x){
  x*2
}

Secret Santa Picker using R

Here’s a quick post on making a secret santa picker using R. The code eliminates a person from picking themselves, otherwise it’s no frills.

#set the variable for the number of people
npeople=5

fam=matrix(ncol=1, nrow=npeople, NA)
fam[1,1]="name1"
fam[2,1]="name2"
fam[3,1]="name3"
fam[4,1]="name4"
fam[5,1]="name5"


fam2=matrix(ncol=1, nrow=npeople, NA)
names=c("name1","name2","name3","name4","name5")
for (i in 1:npeople){
  #pick the first name
  if (i==1){
    xx2=sample(names, (npeople-i+1), replace=FALSE)
  } else
    xx2=sample(xx2, (npeople-i+1), replace=FALSE)
  
  if (xx2[1]!=fam[i,1]){
    fam2[i,1]=xx2[1]
  } else{
    fam2[i,1]=xx2[2]}
    
  
  #set up the new matrix with one less name
  used=which(xx2==fam2[i])
  xx2[used]="zzzzz"
  xx2=sort(xx2)[1:(npeople-i)]
}

#add "has" to the matrix
has=matrix(ncol=1,nrow=npeople, "has")

#build the final matrices
final=cbind(fam,has,fam2)	
#the final results
final

[,1] [,2] [,3]
[1,] “name1” “has” “name4”
[2,] “name2” “has” “name3”
[3,] “name3” “has” “name5”
[4,] “name4” “has” “name2”
[5,] “name5” “has” “name1”

Naming columns and rows

It is often convenient to name the columns and rows within a dataset to keep things clearly organized.  This is very easy to do in R, here’s how.  First let’s make some data.

a <- c(62.3, 55.3, 65.3, 59.3, 67.3)
b <- c(2.2, 5.4, 1.3, 2.8, 5.4)
c <- c(0.1, 1.5, 1.6, 2.1, 0.3)
data <- cbind(a, b, c)
data
##         a   b   c
## [1,] 62.3 2.2 0.1
## [2,] 55.3 5.4 1.5
## [3,] 65.3 1.3 1.6
## [4,] 59.3 2.8 2.1
## [5,] 67.3 5.4 0.3

We can see that this dataset “data” already has the column names “a”, “b”, and “c” stored for me when I used the function cbind. ***For more information on cbind and creating data, check out my other post here***.

Now let’s update the column names to something meaningful using the function colnames. For this function we have to define the data we are working on and then the names of the columns.
colnames(data name here) <- c(“column name 1”, “column name 2”, “etc”)

colnames(data) <- c("temp_F", "wind_m/s", "precip_in")
data
##      temp_F wind_m/s precip_in
## [1,]   62.3      2.2       0.1
## [2,]   55.3      5.4       1.5
## [3,]   65.3      1.3       1.6
## [4,]   59.3      2.8       2.1
## [5,]   67.3      5.4       0.3

Now let’s say that we want to update the row names to be meaningful as well. The function is you guessed it rownames and works the same way as colnames. rownames(data name here) <- c(“row name 1”, “row name 2”, “etc”)

rownames(data) <- c("Site 1", "Site 2", "Site 3", "Site 4", "Site 5")
data
##        temp_F wind_m/s precip_in
## Site 1   62.3      2.2       0.1
## Site 2   55.3      5.4       1.5
## Site 3   65.3      1.3       1.6
## Site 4   59.3      2.8       2.1
## Site 5   67.3      5.4       0.3

Let’s try to break rownames quickly by adding too many row names (6 row names instead of 5). We’ll find that R will get angry and let us know with a semi-cryptic error message.

rownames(data) <- c("Site 1", "Site 2", "Site 3", "Site 4", "Site 5", "Site 6")
## Error in `rownames<-`(`*tmp*`, value = c("Site 1", "Site 2", "Site 3",  : 
## length of 'dimnames' [1] not equal to array extent

If you make a data frame, your data will already reflect the column headers as well. Remember from last time that we can make a data frame using the function data.frame. Here’s an example:

numbers <- c(1, 2, 3, 4, 5)
letters <- c("a", "b", "c", "d", "e")
symbols <- c("!", "@", "#", "$", "%")
data2 <- data.frame(numbers, letters, symbols)
data2
##   numbers letters symbols
## 1       1       a       !
## 2       2       b       @
## 3       3       c       #
## 4       4       d       $
## 5       5       e       %

We can see that our new data frame already has the column names “numbers”, “letters”, and “symbols” from the way we build our dataset.

cbind part 2

In an earlier post we discussed creating data using the functions seq, rep, and then merging them together with cbind. **You can see that post here**. In this post we’re going to go more in depth on the limitations of cbind.

If we make some data of say temp and precip, we can combine using cbind:

temp <- c(52.5, 53.7, 55.7, 57.2, 55.9, 57.3, 60.3)
precip <-c (0.11, 0.22, 0.05, 0.0, 0.0, 0.0, 0.0)
data <- cbind(temp, precip)
data
##      temp precip
## [1,] 52.5   0.11
## [2,] 53.7   0.22
## [3,] 55.7   0.05
## [4,] 57.2   0.00
## [5,] 55.9   0.00
## [6,] 57.3   0.00
## [7,] 60.3   0.00

This worked just fine. The temperature and precipitation data are now stored together in a matrix. We know it’s a matrix because we can query the object type using the functions typeof and class. The “double” means the data is numeric with double precision and the “matrix” means that the data class is a matrix.

typeof(data)
class(data)
## [1] "double"
## [1] "matrix"

cbind works great you are working with the same data types. It does not work well when the data types are different. For instance if data “a” is numeric and data “b” is text, cbind has unexpected results. See for yourself:

a <- c(1, 2, 3, 4, 5)
b <- c("a", "b", "c", "d", "e")
class(a)
class(b)
## [1] "numeric"
## [1] "character"
data2 <-cbind(a, b)
data2
##      a   b  
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"
## [4,] "4" "d"
## [5,] "5" "e"
typeof(data2)
class(data2)
## [1] "character"
## [1] "matrix"

In the results shown above we can see that when we put numeric data together with text data using cbind, the function turns all of the data to text (character). This is not what we want!!! If we try and operate on the data it doesn’t work. We’ll try multiplying the matrix by 2.

2*data2
Error in 2 * data2 : non-numeric argument to binary operator

This is a feature of cbind to be aware of. So what’s the work around you ask? Data frames. Using the same data as above, we’ll store vectors “a” and “b” together in a data frame and preserve their integrity. The function is easy to remember, it’s data.frame.

newdata <- data.frame(a, b)
newdata
##   a b
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
typeof(newdata)
class(newdata)
## [1] "list"
## [1] "data.frame"

More importantly though, if we query the data within the data frame, we’ll see that column “a” is numeric and column “b” is text. Remember we can reference columns and rows within a matrix and data frame using square brackets after the name (data[row, column]). Using data[,2] references all of the rows of data within the second column.

typeof(newdata[,1])
class(newdata[,1])
## [1] "double"
## [1] "numeric"
typeof(newdata[,2])
class(newdata[,2])
## [1] "integer"
## [1] "factor"

We can see from the results that column 1 (“a”) is numeric and column 2 (“b”) is a factor (character). We’ll get more into factors at another time, but it’s safe to say that our data is in tact. We can test this to be certain by operating on each column in our new data frame “newdata”.

2*newdata[,1]
## [1]  2  4  6  8 10

We can operate on column 1 (“a”) which we would expect since they are numeric values.

2*newdata[,2]
## Warning in Ops.factor(2, newdata[, 2]): '*' not meaningful for factors
## [1] NA NA NA NA NA

We cannot operate on column 2 (“b), which we also would expect, since they are text values.

rep, seq, and cbind: Data Creation and Processing

rep and seq are basic functions in R that are very powerful for a wide variety of tasks in R.  rep repeats characters and seq creates a sequence of characters.

The rep function has 2 parameters, the value to repeat and the number of times to repeat it.  Here’s some examples of the rep function:

Repeat the number 1 five times.

rep(1, 5)
## [1] 1 1 1 1 1

Repeat the sequence of values 1-5, three times.

rep(1:5, 3)
##  [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Repeat the days of the week twice.

days=c("mon","tue","wed","thu","fri","sat","sun")
rep(days, 2)
##  [1] "mon" "tue" "wed" "thu" "fri" "sat" "sun" "mon" "tue" "wed" "thu"
## [12] "fri" "sat" "sun"

Now, let’s look at the seq function.  There are three parameters, the start of the sequence, the end, and the interval in that order.

Here’s a sequence from 1 to 5:

seq(from = 1, to = 5, by = 1)
## [1] 1 2 3 4 5

Here’s a sequence from 0 to 10 by 2:

seq(from = 0, to = 10, by = 2)
## [1]  0  2  4  6  8 10

Finally, let’s merge this newly created data together using the function cbind cbind stands for column bind, and will merge vectors together to form a matrix.

In this example we create two vectors of data (x & y) using the concatenate function, then we will merge them using the cbind function:

x = c(1:5)
y = c(-1:-5)
cbind(x, y)
##      x  y
## [1,] 1 -1
## [2,] 2 -2
## [3,] 3 -3
## [4,] 4 -4
## [5,] 5 -5

Finally, let’s add more data to the matrix we created above.

x = c(1:5)
y = c(-1:-5)
data = cbind(x, y)

a = c(1,1,1,1,1)
b = c(2,2,2,2,2)

cbind(data, a, b)
##      x  y a b
## [1,] 1 -1 1 2
## [2,] 2 -2 1 2
## [3,] 3 -3 1 2
## [4,] 4 -4 1 2
## [5,] 5 -5 1 2

For more information on the cbind function, check out my other post here.

Creating data

It might be necessary to create data on the fly using R.  This will also be important to understand for any future posts with example data sets.  First we’ll make a vector of data, using the function c for concatenate.  This code simply produces a vector of the provided sequence.  You will notice that the commas are not included in the output.  These are used a delimiters for the c function.

Create a vector:

c(1,2,3,4,5)
## [1] 1 2 3 4 5

We can also make a vector of text values by placing each value within double quotations “”.  In the next example we will make the same vector as above, but stored as text rather than numbers.  We have also made a vector with the days of the week.

Create a vector of numbers with text:

c("1", "2", "3", "4", "5")
## [1] "1" "2" "3" "4" "5"

Create a vector of days of the week:

c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
## [1] "Sunday"    "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"  "Saturday"

 

Next we will make a matrix using the function matrix.  The matrix function has three main parameters: 1. data, which I have decided to fill with NA in this example  2. nrow, which gives the number of rows in the matrix  3. ncol, which gives the number of columns in the matrix.

Create a matrix:

matrix(NA, nrow = 3 , ncol = 2)
##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA
## [3,]   NA   NA

If we want to fill the matrix with actual data, we can build a vector “y” and set the matrix data parameter equal to y.  This fill the 6 matrix spots with the values 1 through 6.

Fill a matrix with data:

y = c(1,2,3,4,5,6)
matrix (data = y, nrow = 3, ncol = 2)
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

 

By default, the matrix fills by column.  This can be changed using the parameter “byrow” within the matrix function.  The default value is byrow = FALSE, which does not need to be explicitly coded.  To have the matrix fill by row we simply change this to byrow = TRUE.

Fill a matrix BY ROW with vector y:

y = c(1,2,3,4,5,6)
matrix (data = y, nrow = 3, ncol = 2, byrow = TRUE)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6

 

Next we can give this matrix column names and row names with the functions colnames and rownames, respectively.  These two functions operate in the same manner.  These functions require one input, which is the data that will receive the headers (e.g., colnames(rain_data), will add column headers to the data set “rain_data”.  The second part needed is the actual headers.  These are provided as text in a vector using the concatenate function c.  The headers are denoted as text by placing them in double quotations “”.

Give the matrix column and row headers:

y = c(1,2,3,4,5,6)
mymatrix = matrix (data = y, nrow = 3, ncol = 2, byrow = FALSE)
colnames(mymatrix) = c("Col_1", "Col_2")
rownames(mymatrix) = c("Row-1", "Row-2", "Row-3")
mymatrix
##       Col_1 Col_2
## Row-1     1     4
## Row-2     2     5
## Row-3     3     6

Finally, we create a data set using random data from a statistical distribution.  This is a popular method used on blogs and websites like stackoverflow.  I covered how to call statistical distributions from R in my previous post.  For this example, we will generate data using the normal distribution with a mean of 1, and a standard deviation of 2.

Get random data from the normal distribution and put it into a matrix:

randomdata = rnorm (n = 12, mean = 1, sd = 2)
matrix(data = randomdata, nrow = 2, ncol = 6)
##            [,1]      [,2]     [,3]     [,4]       [,5]       [,6]
## [1,] 0.05265589 1.9368580 2.631436 1.025770 -1.5012382  4.0731287
## [2,] 0.52010343 0.8585777 2.371716 4.123003 -0.9378838 -0.8740166

For more information on how to make column names and row names, check out my other post here.