R Functions and Packages¶

Feng Li

School of Statistics and Mathematics

Central University of Finance and Economics

feng.li@cufe.edu.cn

https://feng.li/statcomp

>>> Link to Python version 1

Control-flow constructs¶

The if condition¶

  • Binary comparison

      x == y    all.equal()    identical()
      x != y
      x > y
      x < y
      x >= y
      x <= y
      x %in% y
  • What would you expect when $x$ and $y$ are vectors, matrices ? ...

  • The if condition statement in R

      if (condition){
        do something
      }
      else{
        do something else
      }

Lab: Leap year check¶

  • February 29, known as a leap day in the calendar, is a date that occurs in most years that are evenly divisible by $4$, such as $2004$, $2008$, $2012$ and $2016$. Years that are evenly divisible by $100$ do not contain a leap day, with the exception of years that are evenly divisible by $400$, which do contain a leap day; thus $1900$ did not contain a leap day while $2000$ did.

  • Write a function called is.leapday to check if a given year has February 29 [Hint: you may need ?%%.].

  • Test your function for some years.

  • What can you do to improve for the function in terms of error tolerance?

  • If I want to check which year has a leap day for a sequence of given years. Modify your function to implement it.

Loops¶

  • The for loop
In [2]:
B = matrix(1:10,2,5)
C = matrix(100:109,2,5)
A = matrix(NA,2,5)
for(i in 1:length(A))
{
    A[i] = B[i] + C[i]
}
A
A matrix: 2 × 5 of type int
101105109113117
103107111115119
  • The while loop
In [4]:
i = 0
while(i < length(A)){  
    i = i + 1
    A[i] = B[i] + C[i]
}
A
A matrix: 2 × 5 of type int
101105109113117
103107111115119

apply() type loops¶

  • Calculate row sums for a matrix with a loop.

  • Apply sum() function to each row of the matrix.

  • apply() to an array with higher dimension.

  • Apply your own function to each row of the matrix.

  • lapply() Apply a function to a list

  • mapply() Apply a function to multiple list or vector arguments.

  • The ... arguments in a function.

  • Supply more arguments to apply() type functions.

In [2]:
mat = matrix(1:100,20,5)
mat
A matrix: 20 × 5 of type int
1214161 81
2224262 82
3234363 83
4244464 84
5254565 85
6264666 86
7274767 87
8284868 88
9294969 89
10305070 90
11315171 91
12325272 92
13335373 93
14345474 94
15355575 95
16365676 96
17375777 97
18385878 98
19395979 99
20406080100
In [5]:
apply(mat, 2, mean)
apply(mat, 1, mean)
apply(mat, 2, sd)
  1. 10.5
  2. 30.5
  3. 50.5
  4. 70.5
  5. 90.5
  1. 41
  2. 42
  3. 43
  4. 44
  5. 45
  6. 46
  7. 47
  8. 48
  9. 49
  10. 50
  11. 51
  12. 52
  13. 53
  14. 54
  15. 55
  16. 56
  17. 57
  18. 58
  19. 59
  20. 60
  1. 5.91607978309962
  2. 5.91607978309962
  3. 5.91607978309962
  4. 5.91607978309962
  5. 5.91607978309962
In [6]:
arr = array(1:240, c(20,3,4))
In [9]:
apply(arr, 3, mean)
apply(arr, c(1, 2), mean)
  1. 30.5
  2. 90.5
  3. 150.5
  4. 210.5
A matrix: 20 × 3 of type dbl
91111131
92112132
93113133
94114134
95115135
96116136
97117137
98118138
99119139
100120140
101121141
102122142
103123143
104124144
105125145
106126146
107127147
108128148
109129149
110130150
In [10]:
apply(mat, 2, function (x) max(x)-min(x))
  1. 19
  2. 19
  3. 19
  4. 19
  5. 19
In [11]:
maxmin = function (x) {
    max(x)-min(x)
}

apply(mat, 2, maxmin)
  1. 19
  2. 19
  3. 19
  4. 19
  5. 19
  • The advantages of ()apply.

    • Easy construct

    • Less coding

    • apply() type loops is essentially a more efficient version loop in R.

Write efficient loops in R¶

  • Avoid loops as much as possible. We should always try to vectorize the calculations.

  • Use ()apply type loop if possible.

  • Think a lot about under- and over-flow

  • Allocate the memory space before looping. This is a much slower loop.

      B = matrix(1:10,2,5)
      C = matrix(100:109,2,5)
      A = NULL
      for(i in 1:n)
      {
        A[i] = B[i] + C[i]
      }

List Arithmetics¶

  • Apply a function to the elements of a list

      lapply(X, FUN, ...)
      rapply(object, f, how = c("unlist","replace", "list"), ...)
  • Operators with many lists

      mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
    
      mapply("+", list1, list2, list3, SIMPLIFY = FALSE)
      mapply(function(x, y) abs(x)*log(abs(y)), list1, list2, SIMPLIFY = FALSE)
In [13]:
lst = list (a= rnorm(10), b= 1:9, c=runif(20))
lst
$a
  1. -1.26589644825265
  2. -0.068573308896311
  3. 0.658409475454285
  4. -0.221881865339709
  5. 0.58070039713217
  6. 0.630319389017115
  7. -0.242213569358942
  8. 0.97023453146676
  9. 0.351854369172614
  10. -1.33262148487691
$b
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
$c
  1. 0.181591403437778
  2. 0.164184056222439
  3. 0.390080413082615
  4. 0.657432082109153
  5. 0.256644821958616
  6. 0.564072602661327
  7. 0.525960478000343
  8. 0.604819754138589
  9. 0.951666688779369
  10. 0.97638466511853
  11. 0.508973858784884
  12. 0.284509904216975
  13. 0.563456729985774
  14. 0.968111847294495
  15. 0.453518448630348
  16. 0.633869176264852
  17. 0.606082778424025
  18. 0.908481978345662
  19. 0.129783247830346
  20. 0.170731674181297
In [16]:
length(lst)
lapply(lst, length)
lapply(lst, mean)
3
$a
10
$b
9
$c
20
$a
0.00603314855184166
$b
5
$c
0.525017830473371
In [17]:
lst2 = list (a = lst, b = lapply(lst, mean))
lst2
$a
$a
  1. -1.26589644825265
  2. -0.068573308896311
  3. 0.658409475454285
  4. -0.221881865339709
  5. 0.58070039713217
  6. 0.630319389017115
  7. -0.242213569358942
  8. 0.97023453146676
  9. 0.351854369172614
  10. -1.33262148487691
$b
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
$c
  1. 0.181591403437778
  2. 0.164184056222439
  3. 0.390080413082615
  4. 0.657432082109153
  5. 0.256644821958616
  6. 0.564072602661327
  7. 0.525960478000343
  8. 0.604819754138589
  9. 0.951666688779369
  10. 0.97638466511853
  11. 0.508973858784884
  12. 0.284509904216975
  13. 0.563456729985774
  14. 0.968111847294495
  15. 0.453518448630348
  16. 0.633869176264852
  17. 0.606082778424025
  18. 0.908481978345662
  19. 0.129783247830346
  20. 0.170731674181297
$b
$a
0.00603314855184166
$b
5
$c
0.525017830473371
In [20]:
rapply(lst2, length)
rapply(lst2, length, how="list")
a.a
10
a.b
9
a.c
20
b.a
1
b.b
1
b.c
1
$a
$a
10
$b
9
$c
20
$b
$a
1
$b
1
$c
1
In [21]:
lst1 = list (a = 4:6, b = 5:7)
lst2 = list (a = 3:5, b = 8:10)
In [22]:
lst1 + lst2
Error in lst1 + lst2: non-numeric argument to binary operator
Traceback:
In [25]:
mapply("+", lst1, lst2)
A matrix: 3 × 2 of type int
ab
713
915
1117
In [26]:
mapply(sum, lst1, lst2)
a
27
b
45

Functions¶

  • Create a function object

      myFun = function (par)
      {
        out = max(par1) - min(par2)
        return(out)
      }
  • Load the function:

    • Copy and paste to the R console.
    • Save the function to a file and load it with the source() function.
  • Execute your function.

In [28]:
maxmin = function (x){
    out = max(x)-min(x)
    return(out)
}

maxmin(1:100)
99
In [31]:
rm(list=ls())

source("code/myfun.R")
maxmin(1:100)
99

What is a good function?¶

  • Validating the input parameter type
  • Simple in logic and implementation
  • Error catching
  • Using return()
  • Speed and performance matter.

Lab: a summary function¶

  • Write a function mySummary where the input argument is x can be any vector and the output should contain the basic summary (mean, variance, length, max and minimum values, type) of the vector you have supplied to the function.

  • Test your function with some vectors (that you make up by yourself).

  • What will happen if your input is not a vector (e.g. a data frame weekPlanNew) in our previous example?

Lab: Roots function for the quadratic equation¶

  • The roots for the quadratic equation $ax^2+bx+c=0$ are of the form $$\label{eq:1}

          x_1=\frac{-b + \sqrt {b^2-4ac}}{2a} \quad \text{and} \quad
          x_2=\frac{-b - \sqrt {b^2-4ac}}{2a}$$
  • Write a function named quaroot to solve the roots of given quadratic equation with a ,b , c, as input arguments. [Hint: you may need the sqrt() function]

  • Test your function on the following equations $$\label{eq:2}

      \begin{split}
        x^2+4x-1=0\\
        -2x^2+2x=0\\
        3x^2-9x+1=0\\
        x^2 -4 = 0\\
      \end{split}$$
  • Test your function with the equation $5x^2+2x+1=0$. What are the results? Why? [Hint: check $b^2-4ac$]?

  • Modify your function and return NA if $b^2-4ac < 0$.

In [5]:
quaroot <- function(a, b, c)
{
    x1 <- (-b+sqrt(b^2-4*a*c))/(2*a)
    x2 <- (-b-sqrt(b^2-4*a*c))/(2*a)

    out <- c(x1, x2)
    return(out)
}
In [6]:
quarootm <- function(a, b, c)
{
    d <- b^2-4*a*c

    if(d<0)
    {
        x1 <- NA
        x2 <- NA
    }
    else
    {
        x1 <- (-b+sqrt(d))/(2*a)
        x2 <- (-b-sqrt(d))/(2*a)
    }

    out <- c(x1, x2)
    return(out)
}

Using R packages¶

  • Load an R package: library("PackageName") or require("PackageName")

  • Install an R package from the Internet (CRAN): install.packages("PackageName")

  • Install an R package from GitHub with the devtools package:

       install.packages("devtools")
       devtools::install_github("ykang/gratis")
  • If you have Windows system and sometimes you R package needs to compile, you need the "Rtools" software for building packages for R under Microsoft Windows, available at https://cran.r-project.org/bin/windows/Rtools/.

Coding Style¶

  • Good coding style is like correct punctuation.
  • Bottom line: your coding style should make other people easy to understand you code.
  • Suggested reading the tidyverse style guide: https://style.tidyverse.org/