Back to main workshop page

Objectives:

  • Review what functions do
  • Learn the basics of making your own function
  • Make our own functions!

Why make our own functions?

So far, we’ve used a few functions in R that have proved super useful (remember sqrt(), round(), c()?). These R functions have allowed us to complete complex tasks very efficiently.

However, as you get more and more into R, you will realize that you will require more complex code for more specific tasks. Some of your specific tasks (e.g. working with special data like shp, fasta) might be already figured out for you in an existing R package. Hang tight… we will be exploring our first R package in the second part of this session!

However, you may find yourself having to write code that is specific to your task at hand. Then, in order to use it over and over again, you may be copy-pasting this chunk of code everywhere to make it work. This cann make your code long and messy.

Instead of copy-pasting your code every single time you want to do something, it may be useful to put it into a fuction! Just like the functions you have been using, you can create your own function with a unique name to complete tasks over and over again.

Calculating standard error.

You may have heard of standard deviation (sd) and variance(var), but one other metric that is often used in statistics is standard error.

Standard error is just the standard deviation divided by the square root of the sample size. Let’s calculate standard error first for a hypothetical dataset:

dat <- c(1,2,5,10)
sd(dat) # This is the standard deviation.
se_dat <- sd(dat) / sqrt(4)

Objective: Make a function se()!

The structure of a function

Functions are structured like this:

name_of_function <- function(x){
  # Your code goes here
  # Do thigs with the input variable 'x'
  return(output)
}
  • First, name your function whatever you want.
  • When you use your function later, you will put a number in the parenthesees (e.g. round(3.14159)). Inside of the function, this number will be represented by x in the above example.
  • At the end, the value you want to output will go in a return() function.

Your first function example:

Let’s start by making a simple function. We want this function to return the input number, but squared. we will call this fuction square.

So, if you do

square(10)

you should return 100.

First we make our function setup:

square <- function(){

}

This function will take a number as an input, let’s call that num:

square <- function(num){

}

Do the math, and make a new variable inside of the function called new_num:

square <- function(num){
  new_num <- num ^2
}

Now return the output. Also make sure to add comments!

square <- function(num){
  # Returns the square of the input value.
  # Arguments: num should be a number.
  # returns: new_num, square of num.
  new_num <- num ^2
  return(new_num)
}

Now let’s give it a shot!

square(9)

Challenge:

Write a function that returns the standard error of the input. Remember, the “SE” is the standard deviation divided by the square root of the sample size.

Answer:

se <- function(x){
  # Calculated the standard error of the mean.
  # Argument: x is a vector of umbers.
  # Return: standard error.
  return(sd(x) / legth(x) )
}


Functionns with multiple inputs

Let’s make things a little more complicated. Let’s say we want to change “to what power” increment

x <- 4
square(x,3)  # outputs 64
square(x,5) # outputes 256

For that, all we need to do is to add a second “input” to the function:

square <- function(num, power){
  # Returns the input power to the nth power.
  # Arguments: num should be a number.
  # power should also be a number.
  # returns: new_num, square of num.
  new_num <- num ^power
  return(new_num)
}

Functions with default inputs

Especially since we called the function “square”, let’s see if we can make it so that if we don’t set a “power”, it automatically defaults to 2.

Remember our round function from last week?

round(3.14159)
round(3.14159, digits = 2)

In this case, “digits” defaults to 0 unlless we set it. Similarly, we are going to set up our fuction so that:

square(5) # returns 25
square(5, 3) # returns 125

To do this, we just have to put power = 2 in the “input portion of the function we have written:

square <- function(num, power = 2){
  # Returns the input power to the nth power.
  # Arguments: num should be a number.
  # power should also be a number.
  # returns: new_num, square of num.
  new_num <- num ^power
  return(new_num)
}

Note: If you tried to run square(5) with our previous, two-input function, it would throw an error. This is because we designed the function with two mandatory inputs.

Challenge:

Let’s write a function that zero-centers the mean of a dataset. What do I mean?

dat <- c(1,10,20,50)
mean(dat)
dat_new <- dat - mean(dat)
mean(dat)
  1. Make this into a function called center_mean.
  2. Have the function accept a second input that sets the mean to NOT zero, but to another number. However, if you don’t provide a second input, the function will work as before.
dat = c(1,2,3)
center_mean(dat) # should return (-1,0,1)
## Part 2:
center_mean(dat, 10) # should return(9,10,11)

center_mean <- function(x){ # centers the mean. # x should be a numebr, or vector of numbers. zeroed_dat <- x - mean(x) return(zeroed_dat) } center_mean(dat, 0) center_mean(dat, 100) center_mean(dat)

center_mean <- function(x, center = 0){ # centers the mean. # x should be a numebr, or vector of numbers. zeroed_dat <- x - mean(x) return(zeroed_dat + center) }

Important note: Variable Scope

In R, when we make a variable, we can always access it.

animal <- 'cat'
animal

However, if you make a variable inside of a funnction, you cannot access it from outside. In the above examples, we use variables like “zeroed_dat” inside of a function, but they are inaccessible from the outside.

The variables made inside of a function exist only within the function. That is what is called the scope of the variable.

In most programming lannguages, the opposite is true as well (variables outsidde of a function are not accessible inside of a function). However, that is not true in R.

test_fxn <- function(x){
  print(animal)
  return()
}
test_fxn()

BE CAREFUL OF THIS. Try to design your functions so that they only utilize external variables that are directly inputted to them. Your life will be easier if you treat functions as if their inputs and outputs are the only connections they have to the outside world.

10-minute break, but first…

Try running the following line:

install.packages('ggplot2') # if you get a dialog box click no
library(ggplot2)

Let me know if the second line throws an error!