Common R traps

I will show some R traps here.  The purpose of this page is not going to tell you how “crappy” R is. R is great indeed and also these kind of “traps” can happen in any other languages.

if(a<-5)

Assume you want make a condition to check if “a is smaller than negative five“, then do something. So you wrote

if (a<-5)
 {
      sin(pi/3) } 

but R will check if you “assign positive five to a”  since “<-” in R is an assignment operator. And of course this is always TRUE. As a result, R will always do the calculations within the condition.

Solutions
: use a better coding style. i.e. always put a space between the operator (either assignment operators or relations operators) and values e.g.,

if (a < -5)
 {
     sin(pi/3)
 }

or use the parentheses if you want to make sure what you are ding.

if (a<(-5))
 {
     sin(pi/3)
 }

break a long line

When you want to break a long expression into several lines in R, you don’t have to put a special notation at end of each line and R will check if your expression has finished. This makes thing convenient but also brings troubles.  Assume you have a very long expression and you want to break it into two lines, e.g.

myvalue <- sin(pi/3) + cos(pi/3) + 2*sin(pi/3)*cos(pi/3)

The result should be 2.232051.

But you wrote

myvalue <- sin(pi/3) + cos(pi/3)            + 2*sin(pi/3)*cos(pi/3)

R will think you have finished the expression at the end of first line and started a new expression from the second line.  You will find the result is 1.366025 since the second part is not included in at all.

Solutions: You can either put a pair of parentheses in your expression like this

myvalue <- (sin(pi/3) + cos(pi/3) 
             + 2*sin(pi/3)*cos(pi/3))

but too many parentheses make the code very hard to read. So you can do the trick that alway break the line after the arithmetic operators

myvalue <- sin(pi/3) + cos(pi/3) +            2*sin(pi/3)*cos(pi/3)

diag() function with a vector

As is described in R help document, using ‘diag(x)’ can have unexpected effects if ‘x’ is a vector could be of length one, like this example

> diag(7.4)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    0    0    0    0    0
[2,]    0    1    0    0    0    0    0
[3,]    0    0    1    0    0    0    0
[4,]    0    0    0    1    0    0    0
[5,]    0    0    0    0    1    0    0
[6,]    0    0    0    0    0    1    0
[7,]    0    0    0    0    0    0    1

 Solutions: To avoid this, use “diag(x, nrow = length(x))” for consistent behavior when “x” is a vector

 

> x = c(1,2,3)
> diag(x,length(x))
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3
> x = 2.4
> diag(x,length(x))
     [,1]
[1,]  2.4

sample(x) when length of x is 1 and x is an integer

The first argument of sample function has some inconsistent behaviors when the length of x is 1 and x is an integer, see this example

sample(x=3, n = 10, replace=TRUE) # same as sample(x=1:3, n = 10, replace=TRUE)

If you want to sample “3” ten times with replacement, i.e. you obtain a vector of ten 3, you have to check that condition explicitly.

 

Published
Categorized as Default, R

By Feng Li

Dr. Feng Li is an Associate Professor of Statistics in the School of Statistics and Mathematics at Central University of Finance and Economics in Beijing, China. Feng obtained his Ph.D. degree in Statistics from Stockholm University, Sweden in 2013. His research interests include Bayesian computation, econometrics and forecasting, and distributed learning. His recent research output appeared in statistics and forecasting journals such as the International Journal of Forecasting and Statistical Analysis and Data Mining, AI journals such as Expert Systems with Applications, and medical journals such as BMJ Open.

Leave a comment

Your email address will not be published. Required fields are marked *