Make a bar plot with ggplot

The first time I made a bar plot (column plot) with ggplot (ggplot2), I found the process was a lot harder than I wanted it to be. This post steps through building a bar plot from start to finish.

First, let’s make some data. I’m going to make a vector of months, a vector of the number of chickens and a vector of the number of eggs. That’s random enough for this purpose.

# make some data
months <-rep(c("jan", "feb", "mar", "apr", "may", "jun", 
               "jul", "aug", "sep", "oct", "nov", "dec"), 2)
chickens <-c(1, 2, 3, 3, 3, 4, 5, 4, 3, 4, 2, 2)
eggs <-c(0, 8, 10, 13, 16, 20, 25, 20, 18, 16, 10, 8)
values <-c(chickens, eggs)
type <-c(rep("chickens", 12), rep("eggs", 12))
mydata <-data.frame(months, values)

If parts of the above code don’t make sense, take a look at my post on using the R functions seq (sequence), rep (repeat), and cbind (column bind) HERE.

Now let’s load the ggplot package.

library(ggplot2)

We want to make a plot with the months as the x-axis and the number of chickens and eggs as the height of the bar. To do this, we need to make sure we specify stat = “identity”. Here’s the basic code for this plot.

p <-ggplot(mydata, aes(months, values))
p +geom_bar()

Notice that you will get the error shown above, “stat_count() must not be used with a y aesthetic.” We forgot to specify that we want the height of the column to equal the value for that month. So let’s do it again.

p <-ggplot(mydata, aes(months, values))
p +geom_bar(stat = "identity")

barplot1

This time we get a plot, but it looks fairly ugly, and the months are out of order. In fact the months are in alphabetical order so let’s fix that first. If we investigate the months, we will see they have ordered levels.

mydata$months
#[1] jan feb mar apr may jun jul aug sep oct nov dec jan feb mar apr may
#[18] jun jul aug sep oct nov dec
#Levels: apr aug dec feb jan jul jun mar may nov oct sep

We can fix the order of this category by changing the factor. Here’s some code that will fix our problem.

mydata$months <-factor(mydata$months, 
                      levels = c("jan", "feb", "mar", "apr", "may", "jun",
                                 "jul", "aug", "sep", "oct", "nov", "dec"))

Now if we look at the levels again, we will see that they’re rearranged in the order that we want.

mydata$months
#[1] jan feb mar apr may jun jul aug sep oct nov dec jan feb mar apr may
#[18] jun jul aug sep oct nov dec
#Levels: jan feb mar apr may jun jul aug sep oct nov dec

Okay, let’s make our plot again, this time with the months in the correct order.

p <-ggplot(mydata, aes(months, values))
p +geom_bar(stat = "identity", aes(fill = type)) 

barplot2

Okay, now the months are working, but we realize we only have one set of columns being plotted. We should have two sets, ‘chickens’ and ‘eggs’. To fix this we need to specify some feature that separates them. We already created this in the “type” column when we made our data frame.

If we make the color of the graphs based off of the data category then we should get two sets of columns. In our data frame, we put our categories in the column named “type”. Fill is a property of bar plots. If we were making a line plot and we wanted to set the colors by the type of data we would use color = type rather than fill = type.

p <-ggplot(mydata, aes(months, values))
p +geom_bar(stat = "identity", aes(fill = type))

barplot3

Cool! Sort of. We have stacked bar plots, but I want them next to one another, not stacked. We can fix that with one more change to our code using dodge.

p <-ggplot(mydata, aes(months, values))
p +geom_bar(stat = "identity", aes(fill = type), position = "dodge")

barplot4

Finally, let’s spruce the plot up a little bit. We’ll adjust the x-axis label (xlab), y-axis label (ylab), title (ggtitle) and update the look using theme_bw().

p <-ggplot(mydata, aes(months, values))
p +geom_bar(stat = "identity", aes(fill = type), position = "dodge") +
  xlab("Months") + ylab("Count") +
  ggtitle("Chickens & Eggs") +
  theme_bw()

barplot5

The plot finally looks good and we’re done. Happy plotting!

Advertisements

2 y-axis plotting

A simple plotting feature we need to be able to do with R is make a 2 y-axis plot. First let’s grab some data using the built-in beaver1 and beaver2 datasets within R. Go ahead and take a look at the data by typing it into R as I have below.

# Get the beaver datasets
beaver1
beaver2

We’re going to plot the temperatures within both of these datasets, which we can see (after punching into R) is the third column.

First let’s check the length of these datasets and make sure they’re the same.

# Get the length of column 3
length(beaver1[,3])
length(beaver2[,3])

[1] 114
[2] 100

Since beaver1 is longer, we’ll only plot rows 1 through 100 of the temperature data, so that it is the same length as beaver2.

# Plot the data
plot(beaver1[1:100, 3], type ="l", ylab = "beaver1 temperature")

Cool, your plot should look like this.
beaver1

Now, let’s add that second dataset on the right y-axis. So, we have to have to create a plot on top of this plot using the command par(new = TRUE).

# Add the second y-axis
plot(beaver1[1:100, 3], type ="l", ylab = "beaver1 temperature")
par(new = TRUE)
plot(beaver2[,3], type = "l")

beaver1-2

Woah, this plot is ugly! We have 2 y-axis labels plotting, 2 y-axis values plotting, and 2 x-axis values and labels plotting. Let’s turn those off using the commands xaxt = “n” and yaxt = “n”.

# updated plot
plot(beaver1[1:100, 3], type ="l", ylab = "beaver1 temperature")
par(new = TRUE)
plot(beaver2[,3], type = "l", xaxt = "n", yaxt = "n",
     ylab = "", xlab = "")

beaver1-3

Okay, it’s still pretty ugly, so let’s clean it up. Let’s make the margins bigger on the right side of the plot, add a y2 axis label, add a title, change the color of the lines and adjust the x-axis label. Don’t forget the legend! Here’s the code:

# final plot
par(mar = c(5, 5, 3, 5))
plot(beaver1[1:100, 3], type ="l", ylab = "beaver1 temperature",
     main = "Beaver Temperature Plot", xlab = "Time",
     col = "blue")
par(new = TRUE)
plot(beaver2[,3], type = "l", xaxt = "n", yaxt = "n",
     ylab = "", xlab = "", col = "red", lty = 2)
axis(side = 4)
mtext("beaver2 temperature", side = 4, line = 3)
legend("topleft", c("beaver1", "beaver2"),
       col = c("blue", "red"), lty = c(1, 2))

beaver2

Woo! Looks good. That’s all for now.

ggplot2 (ggplot) Introduction

In this post I’ll briefly introduce how to use ggplot2 (ggplot), which by default makes nicer looking plots than the standard R plotting functions.

The first thing to know is that ggplot requires data frames work properly. It is an entirely different framework from the standard plotting functions in R. Let’s grab a default data frame in R called mtcars. Let’s confirm it’s a data frame using some code:

# Get the mtcars data types
class(mtcars)

R confirms that this is in fact a data frame.

# the output
[1] "data.frame"

Feel free to take a look at the data itself by just typing the name into R. For bevity, I won’t show the data in this post.

# Look at mtcars
mtcars

Next let’s define some standard plot function names in ggplot.
geom_point = scatterplot (points or solid lines)
geom_boxplot = boxplot
geom_bar = column plot
There’s many more (really cool) plot types, but I’ll stop here for now.

Let’s make our scatterplot. Here’s the code to make a standard plot. Don’t forget to load the package ggplot2 before running this code using the library function (install ggplot2 first if you haven’t done so before).

# Plot the data
library(ggplot2)
ggplot(mtcars, aes(hp, mpg)) + geom_point()

basic

Success! The code above seems strange at first, but let’s dive into how it works. First we call ggplot and provide the data frame name ‘mtcars’. Then we give the x & y variables using the aes command. Finally we specify we’re making a scatterplot by attaching + geom_point().

Now let’s make this look better! This is where the power of ggplot shines. It’s really easy to make a nice looking plot.

# Plot the data
p <- ggplot(mtcars, aes(hp, mpg))
p + geom_point() + labs (x = "Horsepower (hp)", y = "Miles per Gallon (mpg)") +
  ggtitle("My mtcars Plot")

basic2

We can see that the syntax is a bit different this time. We save the first ggplot call to a variable p (p for plot), but any variable will work. Then we attached more plotting features using p + ——. For this plot we added custom x and y axis labels and a title.

Next let’s make a change to the overall look of the plot, using what ggplot calls a theme. We’ll add theme_bw.

# Plot the data
p <- ggplot(mtcars, aes(hp, mpg))
p + geom_point() + labs (x = "Horsepower (hp)", y = "Miles per Gallon (mpg)") +
  ggtitle("My mtcars Plot") + theme_bw()

basic3

Finally, let’s spruce it up my coloring the points blue and making them bigger, while also making our axes and titles bigger. The code below makes this final plot.

# Make the final plot
p <- ggplot(mtcars, aes(hp, mpg))
p + geom_point(size = 3, color = "blue") + 
  labs (x = "Horsepower (hp)", y = "Miles per Gallon (mpg)") +
  ggtitle("My mtcars Plot") + theme_bw()+
  theme(axis.text = element_text(size = 12), 
        axis.title = element_text(size = 14),
        plot.title = element_text(size = 18, face = "bold"))

ggplot2 final

Hope this helped explain the basics of ggplot. Here’s the link to the ggplot2 documentation (click me).

A high quality plot

I’ll keep this post short and sweet. Here’s some code to get a really nice looking plot in R. It has a high pixel count to produce a high resolution output that can be used in a word document. Because of this, the size of everything in the plot (axes, points, text, axis labels, etc) has to be increased. I have skipped my normal commentary and instead left comments in the code. If you have any questions, leave a comment. Hope this helps!!

##First let's grab a dataset from the R library
nottem
#if we look at the data type 
typeof(nottem)
class(nottem)
#We need to convert from a "time series" object to a "matrix" to plot it
temp = as.matrix(nottem)
data =  matrix(temp, ncol=12, byrow = TRUE)
colnames(data) = c("jan", "feb", "mar", "apr", "may", "jun",
                   "jul", "aug", "sep", "oct", "nov", "dec")
#This tells you where you data is stored
#This is important because we will store our plot here
getwd()
#Calculate the monthly means
monthlymean<-apply(data, 2, FUN=mean)
#Make boxplots of the observed monthly data 
png("nice-plot.png",width=2400,height=1600)
par(mfrow=c(1,1),mar=c(8,9,10,9))

boxplot(data,col="cornflowerblue",
          xaxt='n', xlab="", ylab="",
          main="", cex.main=2, cex.lab=3, cex.axis=3, outcex=2)
points(monthlymean, pch=24, col="black", bg="red", cex=4)

axis(side=1, at=1:12, labels=FALSE, tick=TRUE, cex.axis=3, tck=-0.01)
mtext(c("Jan","Feb","Mar","Apr","May","Jun","Jul",
        "Aug","Sep","Oct","Nov","Dec"),at=1:12,side=1,line=3,cex=3)
mtext("Temperature (Fahrenheit)",side=2,las=0,cex=3.5,line=4.5)

legend("topleft",pch=c(24), c("Mean Observed Precipitation"),
       col=c("black"),pt.bg=c("red"), cex=3)
mtext("Average Monthly Temperatures at Nottingham", side=3, line=4.5, cex=4)
mtext("1920-1939", side=3, line=0.6, cex=3.5)
box(which="outer",col="black",lwd=1)
dev.off()

A nice plot

Plotting 2

In this post we’ll cover go into more detail on plotting commands. We’ll use a scatterplot (X-Y plot) as our example plot. Again we’ll use the command plot.

##First let's make some data
x<-c(1,3,5,7,9,11)
y<-c(2,4,6,8,10,12)

plot(x,y)

scatter

Next let’s change the axis labels. To change the axis titles we’ll use the commands xlab and ylab for the x-axis and y-axis, respectively. We add these calls within the parenthesis of the plot function. Let’s make the x-axis “Even” and the y-axis “Odd”.

plot(x, y, xlab = "Even", ylab = "Odd")
plot2

Looks good! Now let’s change the x- and y-axis limits. We’ll use the commands xlim and ylim. In each case we give a lower and upper limit, so we need to concatenate them together with the c function. In our example we’ll set the x-axis from 0 to 15 using xlim = c(0, 15), and the y-axis from 1 to 20 using ylim = c(1, 20). Again these commands are added within the plot function.

plot(x, y, xlim = c(0, 15), ylim = c(1, 20), xlab = "Even", ylab = "Odd")
plot3

Next let’s add a title calling it “My Plot”. We’ll use the command main = “add your title here”.

plot(x, y, main = "My Plot", xlim = c(0, 15), ylim = c(1, 20), 
     xlab = "Even", ylab = "Odd")
plot4

Now, let’s spice up the colors of our plot. Let’s make the points red and bigger. We use the calls “col” and “cex” to adjust these items.

plot(x, y, col = "red", cex = 2, main = "My Plot", 
     xlim = c(0, 15), ylim = c(1, 20), xlab = "Even", 
     ylab = "Odd")
plot5

Now let’s make our points a little bit fancier. We can use the command pch to change the points from the standard hollow circle to a filled diamond (pch = 18). You can find a snapshot of the different pch symbols here. Since this is a filled symbol, the call col colors the outline and the call bg colors the fill of the symbol.

plot(x, y, pch = 23, bg = "yellow", col = "red", 
     cex = 2, main = "My Plot", xlim = c(0, 15), 
     ylim = c(1, 20), xlab = "Even", ylab = "Odd")
plot6

Finally let’s complete the plot by adding a legend. The legend is different than the previous calls. It goes outside of the plot() command. Add the legend() command on a second line. The first bit of code “topleft” adds the legend to the top left of the plot. The second bit of code calls the legend item by the name “my data”. The rest of the code defines the legend item as we added it into the plot. The exception is the call “pt.bg” which has to be used instead of just “bg”.

plot(x, y, pch = 23, bg = "yellow", col = "red", 
     cex = 2, main = "My Plot", xlim = c(0, 15), 
     ylim = c(1, 20), xlab = "Even", ylab = "Odd")
legend("topleft", "my data", pch=22, pt.bg="yellow", col="red")
plot7

That’s it for now. We’ll do some more plotting next time!

R Studio and Shiny

R Studio has released a web application that is run (nearly) entirely through R (R Studio). It’s called Shiny and it’s great! It easily lets you turn your R scripts into a webpage. This is great for teaching purposes, showing off some code, and publishing to the web.

R Studio has given its users everything they need to make a web app using templates they have provided.  Everything fits into one “.R” file for easy editing and publishing.

You can find the Shiny page here: http://shiny.rstudio.com/

Here’s a link to my Shiny app. This has 4 statistical distributions (normal, lognormal, weibull, exponential) and let’s the user interact with the variables. The box plot and histogram of the data respond to the user controlled inputs.

Check it out here: My Shiny App
(Make sure to give it about 30 seconds to fully load for the first time.)

screenshot

Plotting Introduction

In this post we’ll cover some of the basic plot types in R such as scatter plots, box plots, histograms, and line graphs. Let’s start with the basic scatter plot.  We’ll use the command plot.

##First let's make some data
x<-c(1,2,3,4,5,6)
y<-c(2,4,6,8,10,12)

plot(x,y)

scatter

Next we’ll make a box plot. For this plot we’ll use some data that is available within R already. Let’s take a look at what’s already within R using the command data(). We see that there’s lots of datasets within R that are ready to go! Let’s load ‘rivers’, which says it is the length of major rivers in North America. Use data(rivers) to load rivers into R. Type ‘rivers’ to see what the data looks like.

data(rivers)
rivers
##   [1]  735  320  325  392  524  450 1459  135  465  600  330  336  280  315
##  [15]  870  906  202  329  290 1000  600  505 1450  840 1243  890  350  407
##  [29]  286  280  525  720  390  250  327  230  265  850  210  630  260  230
##  [43]  360  730  600  306  390  420  291  710  340  217  281  352  259  250
##  [57]  470  680  570  350  300  560  900  625  332 2348 1171 3710 2315 2533
##  [71]  780  280  410  460  260  255  431  350  760  618  338  981 1306  500
##  [85]  696  605  250  411 1054  735  233  435  490  310  460  383  375 1270
##  [99]  545  445 1885  380  300  380  377  425  276  210  800  420  350  360
## [113]  538 1100 1205  314  237  610  360  540 1038  424  310  300  444  301
## [127]  268  620  215  652  900  525  246  360  529  500  720  270  430  671
## [141] 1770

To make a box plot use the function boxplot.

boxplot(rivers)

rivers

Now let’s make a histogram with the same ‘rivers’ dataset.

hist(rivers)

rivers_hist

Let’s make a line plot using some time series data. Let’s load the dataset ‘airmiles’.

data(airmiles)
airmiles
## Time Series:
## Start = 1937 
## End = 1960 
## Frequency = 1 
##  [1]   412   480   683  1052  1385  1418  1634  2178  3362  5948  6109
## [12]  5981  6753  8003 10566 12528 14760 16769 19819 22362 25340 25343
## [23] 29269 30514

plot(airmiles)

airmiles_line

We’ll notice that we used the same function plot to make a line graph here as we did in the first plot using x and y to make a scatter plot. R has it’s own defaults based on the type of data it receives for the function plot.

If we want to force the plot type (a solid line versus points) we can use the parameter ‘type’, where type=”l” makes a line and type=”p” makes points. Let’s see this using the airmiles plot again.

plot(airmiles, type="p")

xx

Finally let’s use the data x and y from the first plot to make a plot with a line rather than points.

plot(x,y, type="l")

xy_line

That’s it for now, we’ll add more plotting options for the graphs in subsequent posts.