Back to main workshop page

Objectives:

  • Learn why we are using R
  • Become familiar with the R environment
  • Learn how to do math and assign values to variables/objects
  • Use comments to inform the script
  • Import a spreadsheet of data
  • Exploring a data frame
  • Explore plotting the data
  • Pointing you towards future resources

We will be using a dataset of mammals later in this lesson. Please download it using this link:Download link

Why R?

I already know how to use excel, and have done some statistics in other programs (e.g. JMP). Why do I need to learn ANOTHER thing?

You’ve may have heard of the importance of reproducible science. In a well-written paper, all of the technical steps are laid out so somebody else can replicate the experiment exactly with the information given. These reproducible methods are super important for scientific progress.

This same idea applies to data analysis! In a perfect world, I want to see your original entered data sheet (the one you typed in the data onto), as well as every step along the way (until you get to your final graphs and statistics). This can be hard for me to see in excel, as it is hard to retrace your steps and know everything that you did.

If you do all of this data formatting, calculation, plotting, and stats in a programming language, it is automatically keeping a track of everything you are doing. Every line of code is a “step” between your raw data and your result!

More and more journals nowadays expect that your programming code and your raw data are made available online. This means that even Scientific programming isn’t just for quantitative biologists anymore! Go us!

Familiarizing yourself with R Studio

R studio adds extra usability to R by showing you what’s going on in one handy window. 4 panes show you your script (the text document), the R console (the R code running), the environment (what R sees), and the files/output (what goes in/out). we will be using this throughout the lesson.

Tip: To run code from your script. highlight and hit control-enter (command-enter on mac). Or, to run a whole line, put your cursor on the line and hit that key combo.

Starting with variables

You can get output from R simply by typing math in the console:

3 + 5
12 / 7

However, we want more than a fancy calculator right? You were promised so much more! Things become very interesting when you can assign values to objects. To create an object, we need to give it a name, then the assignment operator <-, then the value we want to give it:

weight_kg <- 55

We have now assigned the value of 55 to the variable weight_kg

weight_kg

Object names can’t start with a number (e.g. 2ndTreatment is no good but treatment_2 works). Name your variables the “shortest thing that makes sense” since you’ll be typing it over and over again!

In general you don’t want to give something a name that’s already taken. If we make another thing called weight_kg, we will “overwrite” the first one with no warning:

weight_kg <- 60
# Type name of object to print
weight_kg

Now that r “remembers” weight_kg, we can do some math with it! Let’s find out what 60kg is in pounds:

# pounds is 2.2x kg.
2.2 * weight_kg
# did that change weight_kg?
weight_kg

We can also change an object’s value by assigning it a new one:

weight_kg <- 57.5
2.2 * weight_kg

let’s save the result of this calculation to a new variable, weight_lb:

weight_lb <- 2.2 * weight_kg
weight_lb
# change weight_kg:
weight_kg <- 100
# what is the result here?
weight_lb
# what about now?
weight_kg <- weight_kg / 2
weight_lb
weight_kg

To run a command you ran previously, hit “up” in the console.

Using comments to inform the script

In the above examples, I am using a pound sign / hashtag to “comment” the script. This lets me write regular English throughout my code in order to understand it better. this will be super helpful going ahead in the future!

Sometimes, I even start a script by writing in what I want to do in a few comments. Then I fill in the space between the comments with the code.

Using functions in R

You’re likely going to very quickly want to do something a little more than addition, subtraction, multiplication, and division in R. For this we will use our own functions that allow us to do more complex operations. A function executes a more complex chunk of code. Functions look a bit like variables, but have parentheses after them. the “input” of the function goes into these parentheses.

Diagram of a function

Functions can be anything! functions allow us to plot, calculate, transform, and operate on data in any way. they can be as simple as rounding a number.

round(3.14159)

or getting the square root:

# square root of the weight from above
b <- sqrt(weight_lb)
b

To round the square root of the weight in pounds, we can do it in two steps:

sqrt_weight_lb = sqrt(weight_lb)
round(sqrt_weight_lb)

… or we can skip the middle step and put a function inside of the other:

round(sqrt(weight_lb))

Many functions have more than one input. Usually the first one is required, then the following ones modify the way the function… functions.

# Let's see if round has any more tricks:
?round
round(sqrt(weight_lb), digits = 2)

Data types in R: vectors and data frames.

So far, every variable has represented a single number. However, variables in R can represent much more. they can represent words as well…

animal <- "sea urchin"

…or even groups of numbers and words.

A “list” of things is called a vector in R. We make lists using the command c():

animals <- c('Dogs', 'Sheep', 'Pigs')
animals
# this tells us the "class":
class(weight_g)
# Let's try with numbers:
weights <- c(1,10,300, 40)

We can pull out individual values from the vector using brackets:

# extract first "element" of the vector:
animals[1]
animals[2]
# we can also pull out multiple into a smaller
animals[c(1,2)]
animals[c(1,1,1,1,1,2)]

Logical Indexing

What if we want to grab the smallest numbers out of one of these vectors?

weights <- c(1,10,300, 40)
# Grab weights only if weights are less than 30
weights[weights<40]
Symbol Meaning
< Less than
<= Less than or equal to
> Greater than
>= Greater than or equal to
== equal to
!= Not equal to

Let’s play with these a bit!

  • Get the numbers not equal to 40.
  • Get the numbers greater than or equal to 10

Importing Data

Make a folder on your Desktop called “workshop”, and a folder in that called ‘data’. then put this spreadsheet inside of it:

Download link

This file is a csv, or “comma-separated value”. It is the best way to import data into R. You can easily save your excel spreadsheets into a csv file (“save as” in excel), but you will have to save each “sheet” as a separate csv. There are ways to read excel .xls files directly into R, but we won’t be covering that link to that code here, though

Finding out where we are on the computer

R will let you navigate through the computer. A new project will probably start you out in your “Documents” folder, or in your “Home” folder (where you Document folder is).

# Where are we?
getwd()
# What files are there?
list.files()

Moving around on the computer

Let’s move into the folder where our data is.

If you start in your Home folder, you just need to go onto your desktop, then into your workshop folder, then get mammals.

setwd('Desktop/workshop/data')
getwd()

If you start in your documents folder, you need to go 1 folder “up”, or go into the enclosing folder. That require a ..:

# .. always gets you 1 folder "up":
setwd('..')
getwd()
setwd('data')
setwd('../data') # go out of workshop ,and right back in.

If you ever get “lost”, you can always go back to your “starting point” with a tilde ~ (above the tab key).

setwd(`~`) # ET PHONE Home
# If you don't know where you're starting, you'll ALWAYS get to the workshop folder this way:
setwd('~/Desktop/workshop/data')

Now let’s import the data! To do that we use another function called read.csv. read.csv takes a file name as an input, and returns the contents of that file.

read.csv('mammals.csv')

Whoops, just like in the beginning when we asked R 4+2, the “output” of read.csv isn’t assigned to a variable name. we can do that here:

mammals = read.csv('mammals.csv')

Note: You could’ve put in the whole name of the file from the start (‘Desktop/workshop/mammals.csv’). So read.csv can understand that as well!

Now typing “mammals” should show the whole spreadsheet:

mammals

You can click on the name “mammals” on the sidebar for a excel-style look at the data. But we can also do it straight from the R “command line”:

mammals
# show the first 5 lines:
head(mammals)
# show the last 5 lines:
tail(mammals)
# use summary to get a lot of information:
summary(mammals)

You can use indexing on a dataframe by telling the rows and columns you are interested in.

# First row, first column
mammals[1,1]
# First row, complete
mammals[1,]
# First column
mammals[,1]

…but we can also get columns by name using $

mammals$order

Let’s make a new column called chunkiness, which is its weight divided by its length:

# Does a column already exist?
mammals$chunkiness
mammals$chunkines <-
    mammals$adult_body_mass_g / mammals$adult_head_body_len_mm

Now let’s do some simple plots: Longer animals should weigh more, but let’s give it a look-see.

Easy quick plots in R are made using the plot function. You type the y variable, followed by a ~, and then the x variable - then you tell it which data frame to pull those columns from. This is a little weird at first, but this is very normal for statistics!

plot(adult_head_body_len_mm ~ adult_body_mass_g, data=mammals)

Great! That’s probably a good stopping point - let’s move on to Intro to R Part 2 after a short break.