Creating data

It might be necessary to create data on the fly using R.  This will also be important to understand for any future posts with example data sets.  First we’ll make a vector of data, using the function c for concatenate.  This code simply produces a vector of the provided sequence.  You will notice that the commas are not included in the output.  These are used a delimiters for the c function.

Create a vector:

c(1,2,3,4,5)
## [1] 1 2 3 4 5

We can also make a vector of text values by placing each value within double quotations “”.  In the next example we will make the same vector as above, but stored as text rather than numbers.  We have also made a vector with the days of the week.

Create a vector of numbers with text:

c("1", "2", "3", "4", "5")
## [1] "1" "2" "3" "4" "5"

Create a vector of days of the week:

c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
## [1] "Sunday"    "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"  "Saturday"

 

Next we will make a matrix using the function matrix.  The matrix function has three main parameters: 1. data, which I have decided to fill with NA in this example  2. nrow, which gives the number of rows in the matrix  3. ncol, which gives the number of columns in the matrix.

Create a matrix:

matrix(NA, nrow = 3 , ncol = 2)
##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA
## [3,]   NA   NA

If we want to fill the matrix with actual data, we can build a vector “y” and set the matrix data parameter equal to y.  This fill the 6 matrix spots with the values 1 through 6.

Fill a matrix with data:

y = c(1,2,3,4,5,6)
matrix (data = y, nrow = 3, ncol = 2)
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6

 

By default, the matrix fills by column.  This can be changed using the parameter “byrow” within the matrix function.  The default value is byrow = FALSE, which does not need to be explicitly coded.  To have the matrix fill by row we simply change this to byrow = TRUE.

Fill a matrix BY ROW with vector y:

y = c(1,2,3,4,5,6)
matrix (data = y, nrow = 3, ncol = 2, byrow = TRUE)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6

 

Next we can give this matrix column names and row names with the functions colnames and rownames, respectively.  These two functions operate in the same manner.  These functions require one input, which is the data that will receive the headers (e.g., colnames(rain_data), will add column headers to the data set “rain_data”.  The second part needed is the actual headers.  These are provided as text in a vector using the concatenate function c.  The headers are denoted as text by placing them in double quotations “”.

Give the matrix column and row headers:

y = c(1,2,3,4,5,6)
mymatrix = matrix (data = y, nrow = 3, ncol = 2, byrow = FALSE)
colnames(mymatrix) = c("Col_1", "Col_2")
rownames(mymatrix) = c("Row-1", "Row-2", "Row-3")
mymatrix
##       Col_1 Col_2
## Row-1     1     4
## Row-2     2     5
## Row-3     3     6

Finally, we create a data set using random data from a statistical distribution.  This is a popular method used on blogs and websites like stackoverflow.  I covered how to call statistical distributions from R in my previous post.  For this example, we will generate data using the normal distribution with a mean of 1, and a standard deviation of 2.

Get random data from the normal distribution and put it into a matrix:

randomdata = rnorm (n = 12, mean = 1, sd = 2)
matrix(data = randomdata, nrow = 2, ncol = 6)
##            [,1]      [,2]     [,3]     [,4]       [,5]       [,6]
## [1,] 0.05265589 1.9368580 2.631436 1.025770 -1.5012382  4.0731287
## [2,] 0.52010343 0.8585777 2.371716 4.123003 -0.9378838 -0.8740166

For more information on how to make column names and row names, check out my other post here.

 

 

 

One thought on “Creating data

Leave a comment