Continuous Updates: R Tricks I Found Useful

5 minute read

Published:

Welcome to this continuously updated blog post where I’ll be sharing R tips and tricks that I find helpful while coding. These range from simple one-liners to more complex functions that can save you time and effort. Some of the examples below may be too specific and not useful to you, while others are general R tips. I’m always interested in finding efficient solutions to complex problems and simplifying my code.

In this post, you’ll find useful R tricks that have helped me boost my coding process and increase productivity. I decided to regularly update this post because I sometimes forget how to code something or how a particular code worked over time. By updating this post regularly, I can stay organized and make it easy to search for code examples in case I encounter similar problems later. Hopefully, you’ll find something useful in this post too.

ଘ(੭ˊᵕˋ)੭ Stay tuned and happy coding!

║█║▌║█║▌ Latest update 08/29/23: Convert Country name to country code (or vice versa) using countrycode package ║█║▌║█║▌

1. dplyr::filter()

df <- data.frame(Letter = c("A","A","B","C","D"), Count = c(1,2,1,1,1))
df
  Letter Count
1      A     1
2      A     2
3      B     1
4      C     1
5      D     1

Goal: select rows in df where Letter is A and count is 1. e.g.

  Letter Count
1      A     1
  • Method 1:
suppressWarnings(df[which(df$Letter == "A" && df$Count == 1), ])
  • Method 2:
library("dplyr")
df %>% 
   dplyr::filter(Letter == "A" & Count == 1)

Average time comparison of two methods above under 100 replications:

Method 1: 0.00019 secs vs. Method 2: 0.00094 secs. Although Method 1 runs slightly faster in this mini example, Method 2 is easier to code. Also, it’s important to note that without using suppressWarnings(), Method 1 would return a warning message due to calling “&&”. See this link for more details.

image

2. base::outer()

Goal: generate variance covariance matrix $\Sigma$, where $\Sigma=(\sigma_{jk})_{jk}$ with $\sigma_{jk} = 0.5^{|j-k|}$ for $j,k = 1,\ldots,5$. e.g.

       [,1]  [,2] [,3]  [,4]   [,5]
[1,] 1.0000 0.500 0.25 0.125 0.0625
[2,] 0.5000 1.000 0.50 0.250 0.1250
[3,] 0.2500 0.500 1.00 0.500 0.2500
[4,] 0.1250 0.250 0.50 1.000 0.5000
[5,] 0.0625 0.125 0.25 0.500 1.0000
  • Method 1:
  MA <- matrix(0,5,5)
  for (j in 1:5) {
    for (k in 1:5) {
      MA[j,k]<-0.5^(abs(j-k))
    }
  }
  • Method 2:
matrix((0.5)^(abs(outer(1:5, 1:5, "-"))), nrow = 5, ncol = 5)

Average time comparison of two methods above under 100 replications:

Method 1: $4.67\times 10^{-5}$ secs vs. Method 2: $3.26\times 10^{-5}$ secs. In this mini example, Method 2 has a faster average running time and fewer lines of code than Method 1. Moreover, if we replace 5 with a much larger number, Method 2 will run even faster compared to Method 1.

3. base::split()

library(datasets)
data(iris)
table(iris$Species)
    setosa versicolor  virginica 
        50         50         50 

Goal: divide rows in iris into three groups using Species e.g. we are expecting to see 50 rows in each group.

  • Solution:
split(iris, iris$Species)

4. base::sapply() vs base::lapply().

  • What’s the difference between sapply() and lapply()? The two functions take the same input arguments but return different object classes. lapply() always returns a list, whereas sapply() returns a vector or matrix.

Goal: using data iris, calculate mean of Sepal.Length under each Species.

  • sapply()
    sapply(split(iris, iris$Species), function(x) mean(x$Sepal.Length))
    
    setosa versicolor  virginica 
     5.006      5.936      6.588 
class(sapply(split(iris, iris$Species), function(x) mean(x$Sepal.Length)))
[1] "numeric"
  • lapply()
lapply(split(iris, iris$Species), function(x) mean(x$Sepal.Length))
$setosa
[1] 5.006

$versicolor
[1] 5.936

$virginica
[1] 6.588
class(lapply(split(iris, iris$Species), function(x) mean(x$Sepal.Length)))
[1] "list"

I’m interested to see the running time of using a for loop versus the the function from apply family. Rumor said that the apply function runs faster than for loops, and while it is true that for loops typically require more lines of code than those using apply. However, in terms of running time, we will see. This is something that will be addressed later in this post.

5. devtools::install_github.

This function allows you to install packages directly from github.

library(devtools)
install_github("DeveloperName/PackageName")

6. RColorBrewer and grDevices::colors()

The color palette for R. Pre-set usually has maximum 9 distinctive colors. However, what if I want to color coded use a number of (let’s say 30) distinct colors? How do I choose? I learned form this link.

  • Method 1 (This palette contains 74 colors)
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'qual',]
col_vector <- unlist(mapply(brewer.pal, qual_col_pals$maxcolors, rownames(qual_col_pals)))
pie(rep(1,ncolor), col=sample(col_vector, 30))

image

sample(col_vector, 30)
 [1] "#FDC086" "#377EB8" "#F0027F" "#FDBF6F" "#8DA0CB" "#E31A1C" "#B3E2CD" "#33A02C" "#FED9A6" "#BEBADA" "#E6AB02"
[12] "#CCEBC5" "#A6CEE3" "#666666" "#E78AC3" "#BEAED4" "#CCEBC5" "#666666" "#FB9A99" "#D95F02" "#F781BF" "#1B9E77"
[23] "#FFFF99" "#FFD92F" "#F1E2CC" "#CBD5E8" "#FDCDAC" "#FDB462" "#F4CAE4" "#66A61E"
  • Method 2 (This palette contains 657 colors)
sample(grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)],30)
 [1] "tomato2"         "palegreen"       "blueviolet"      "ivory2"          "royalblue"       "skyblue2"       
 [7] "pink3"           "lightcyan1"      "palevioletred1"  "floralwhite"     "magenta"         "olivedrab3"     
[13] "lightcyan4"      "navajowhite"     "ivory4"          "darkorchid2"     "orchid"          "linen"          
[19] "blue3"           "lightcyan3"      "antiquewhite1"   "mediumorchid2"   "violet"          "red4"           
[25] "indianred2"      "pink"            "mediumvioletred" "magenta1"        "deeppink4"       "yellow"   

7. Extracting coefficients and p-values from lm()

  • Case 1: only one predictor RTTO model
    fit1 <- lm(y ~ -1 + X1,data)
    summary(fit1)$coefficients[1] # regression coefficient
    summary(fit1)$coefficients[4] # p-value
    

    Alternatively,

summary(fit1)$coefficients[,1] # regression coefficient
summary(fit1)$coefficients[,4] # p-value
  • Case 2: more than one predictor or not RTTO model
summary(fit1)$coefficients[,1] # multiple regression coefficients
summary(fit1)$coefficients[,4] # multiple p-values

Comment 1: the lines

summary(fit1)$coefficients[1] 
summary(fit1)$coefficients[4]

will no longer work since we now need to specify row and column accordingly –> in R, [row index, column index]

Comment 2: the data in lm() needs to be a data frame –> in R, use as.data.frame()

8. Convert Country name to country code (or vice versa) using countrycode package

The countrycode package allows you to convert Country Names and Country Codes

install.packages("countrycode")
library(countrycode)
> countrycode('Zimbabwe', origin = 'country.name', destination = 'iso3c')
[1] "ZWE"
> countrycode("ZWE", origin = 'iso3c', destination = 'country.name')
[1] "Zimbabwe"