checkglobals: an(other) R-package for static code analysis

Introduction

An important aspect of writing an R-script or an R-package is ensuring reproducibility and maintainability of the developed code, not only for others, but also for our future selves. The modern R ecosystem provides various tools and packages to help organize and validate written R code. Some widely used packages include roxygen2 (for function documentation), renv (for dependency management and environment isolation), and testthat, tinytest and Runit for unit testing[1].

When it comes to package development, it is good practice to run R CMD check to perform a series of automated checks identifying possible issues with the R-package. Among the checks performed by R CMD check is a static inspection of the internal syntax trees of the code through the use of the codetools package. This code analysis discovers undefined functions and variables without executing the code itself, leading to the following (perhaps familiar) notifications:

❯ checking R code for possible problems ... NOTE
my_fun: no visible binding for global variable ‘g’

The undefined global variables returned by R CMD check may be false positives caused by functions that use data-masking or non-standard evaluation, such as subset(), transform() or with(). In these cases, a common solution is to suppress the notifications by including the variable names inside a call to utils::globalVariables().

Most importantly, we wish to detect variable names that are truly undefined as soon as possible, as these could point to a mistake in the code or signal a missing function or package import.

In this context, this post introduces a minimal R-package checkglobals aimed at serving as an efficient alternative to the static code analysis provided by codetools to check R-packages and R-scripts for missing function imports and variable names on-the-fly. The code inspection procedures are implemented using R’s internal C API for efficiency, and no external R-package dependencies are strictly required, (only cli and knitr are suggested for interactive use and checking Rmd documents respectively).

Example usage

The checkglobals-package contains a single wrapper function checkglobals() to inspect R-scripts, Rmd-documents, folders, R-code strings or R-packages. As an example, consider the following R-script containing a demo Shiny application (source: https://raw.githubusercontent.com/rstudio/shiny-examples/main/004-mpg/app.R).

# scripts/app.R
library(shiny)
library(datasets)

# Data pre-processing ----
mpgData <- mtcars
mpgData$am <- factor(mpgData$am, labels = c("Automatic", "Manual"))

# Define UI for miles per gallon app ----
ui <- fluidPage(
  titlePanel("Miles Per Gallon"),
  sidebarLayout(
    sidebarPanel(
      selectInput("variable", "Variable:",
                  c("Cylinders" = "cyl",
                    "Transmission" = "am",
                    "Gears" = "gear")),
      checkboxInput("outliers", "Show outliers", TRUE)
    ),
    mainPanel(
      h3(textOutput("caption")),
      plotOutput("mpgPlot")
    )
  )
)

# Define server logic to plot various variables against mpg ----
server <- function(input, output) {
  formulaText <- reactive({
    paste("mpg ~", input$variable)
  })
  output$caption <- renderText({
    formulaText()
  })
  output$mpgPlot <- renderPlot({
    boxplot(as.formula(formulaText()),
            data = mpgData,
            outline = input$outliers,
            col = "#75AADB", pch = 19)
  })
}

# Create Shiny app ----
shinyApp(ui, server)

Calling checkglobals() with the argument file on the R-script saved as a local file returns as output:

Looking at the printed output of the object returned by checkglobals(), it lists the following information:

  1. the name and location of all unrecognized global variables;
  2. the name and location of all detected imported functions grouped by R-package.

The location app.R#36 lists the R-file name (app.R) and line number (36) of the detected variable or function. If cli is installed and cli-hyperlinks are supported, clicking the location links opens the source file pointing to the given line number. The bars and counts behind the imported package names highlight the number of function calls detected from each package.

More detailed information can be obtained by calling print() directly. For instance, we can print the referenced source code lines of the unrecognized global variables with:

The detection of imported functions and packages is an important motivation for the checkglobals-package. First, this allows us to validate the NAMESPACE file of a development R-package or check R-scripts for any additional packages that require installation before execution of the code. Second, this information can be used to get a better sense of the importance of an imported package, for instance to determine how much effort it would take to remove or replace it as a dependency. This is different from e.g. the codetools package, where findGlobals() or checkUsage() return an undefined variable name if a function import is not recognized, but do not return variable names that have been recognized as imports. The same is true for the convenience packages lintr (with object_usage_linter()) or globals which provide codetools wrappers producing similar results as returned by R CMD check. More similar is renv::dependencies(), which scans for all loaded and/or imported packages in an R project folder by analyzing the DESCRIPTION and NAMESPACE files of an R-package or by detecting calls to library(), require(), etc. in an R-script. Note that renv::dependencies() returns package names, but not the functions called from these packages.

An additional benefit of a minimal and efficient code analysis package is that we can significantly reduce the runtime required to inspect large R-packages or codebases allowing to quickly check the code interactively during development:

## absolute timings (seconds) for inspecting the shiny package 
## (100-fold relative time difference)
bench::mark(
  lint_package = lint_package("~/git/shiny", linters = list(object_usage_linter())),
  checkglobals = checkglobals(pkg = "~/git/shiny/"),
  iterations = 10,
  check = FALSE,
  time_unit = "s"
)
#> # A tibble: 2 × 6
#>   expression      min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>    <dbl>  <dbl>     <dbl> <bch:byt>    <dbl>
#> 1 lint_package 18.8   19.5      0.0508    1.33GB     2.42
#> 2 checkglobals  0.157  0.162    5.96     15.69MB     1.19

More examples

R Markdown files

The file argument also accepts R Markdown (.Rmd or .Rmarkdown) file locations. For R Markdown files, the R code chunks are first extracted into a temporary R-script with knitr::purl(), which is then analyzed by checkglobals(). Instead of a local file, the file argument in checkglobals() can also be a remote file location (e.g. a server or the web), in which case the remote file is first downloaded as a temporary file with download.file(). Below, we scan one of tidyr’s package vignettes (source: https://raw.githubusercontent.com/tidyverse/tidyr/main/vignettes/tidy-data.Rmd),

R-packages that are imported or loaded, but have no detected function imports are displayed with an n/a reference. This can happen when checkglobals() falsely ignores one or more imported functions from the given package or when the package is not actually needed as a dependency. In both cases this is useful information to have. In the above example, tibble is loaded in order to use tribble(), but the tribble() function is also exported by dplyr, so it shows up under the dplyr imports instead.

Folders

Folders containing R-scripts can be scanned with the dir argument, which inspects all R-scripts present in dir (and any of its subdirectories). The following example scans an R-Shiny app folder containing a ui.R and server.R file (source: https://github.com/rstudio/shiny-examples/tree/main/018-datatable-options),

If imports are detected from an R-package not installed in the current R-session, an alert is printed (as with the DT package above). Function calls accessing the missing R-package explicitly, using e.g. :: or :::, can still be fully identified as imported function names. Function calls with no reference to the missing R-package will be listed as unrecognized global variables.

R-packages

R-package folders can be scanned with the pkg argument. Conceptually, checkglobals() scans all files in the /R folder of the package and contrasts the detected (unrecognized) globals and imports against the imports listed in the NAMESPACE file of the package. R-scripts present elsewhere in the package (e.g. in the /inst folder) are not analyzed, as these are not covered by the package NAMESPACE file. To illustrate, we can run checkglobals() on its own package folder:

Bundled packages

Besides local R-package folders, the pkg argument also accepts file paths to bundled source R-packages (tar.gz). This can either be a tar.gz package on the local filesystem, or a remote file location, such as the web (similar to the file argument).

Local filesystem:

Remote file location:

Known limitations

To conclude, we discuss some of the limitations of static code analysis with codetools and checkglobals. When using codetools (or R CMD check) there are several scenarios where the code inspection is known to skip undefined names that could potentially be detected. First, a variable that requires evaluation before it is defined may be missed, as codetools does not track in which order assignment and evaluation happen inside a local scope. Here is a minimal example using codetools::findGlobals():

## findGlobals requires a function as input
test1 <- function() {
  print(x)
  x <- 1
}

## calling this function generates an error
test1()
#> [1] NA
library(codetools)

## x is not recognized as an undefined 
## variable at the moment of evaluation
findGlobals(test1)
#> [1] "{"     "<-"    "print"

Another quite common situation is the use of a character function name inside a functional, e.g. Reduce(), Filter(), Map() or the apply-family of functions. These function names are viewed by codetools as ordinary character strings:

test2 <- function() { 
  do.call("foo", 1) 
}

## foo is not recognized as an undefined
## variable since it is defined as a string
findGlobals(test2)
#> [1] "{"       "do.call"

Finally, more complex assignment statements may not always be handled as expected:

test3 <- function() { 
  assign(x = "x1", value = 1)
  assign(value = 2, x = "x2")
  c(x1, x2)
}

## assignment to x1 is recognized correctly, 
## but assignment to x2 is not
findGlobals(test3)
#> [1] "{"      "assign" "c"      "x2"
x <- NA
test4 <- function() { 
  x <<- 1
  x
}

## x is assigned in a different scope 
## but is available when evaluated
findGlobals(test4)
#> [1] "{"   "<<-" "x"

The checkglobals-package tries to address some of these use-cases, but due to R’s flexibility as a language, there are a number of use-cases we can think of that are either too ambiguous or complex to be analyzed without evaluation of the code itself. Below we list some of these cases, where checkglobals() fails to recognize a variable name (false negative) or falsely detects a global variable when it should not (false positive).

Character variable/function names

## this works (character arguments are recognized as functions)
checkglobals(text = 'do.call(args = list(1), what = "median")')
checkglobals(text = 'Map("g", 1, n = 1)')
checkglobals(text = 'stats::aggregate(x ~ ., data = y, FUN = "g")')

## this doesn't work (evaluation is required)
checkglobals(text = 'g <- "f"; Map(g, 1, n = 1)')
checkglobals(text = "eval(substitute(g))") ## same for ~, expression, quote, bquote, Quote, etc.
## this works (calling a function in an exotic way)
checkglobals(text = '"head"(1:10)')
checkglobals(text = '`::`("utils", "head")(1:10)')
checkglobals(text = 'list("function" = utils::head)$`function`(1:10)')

## this doesn't work (evaluation is required)
checkglobals(text = 'get("head")(1:10)')
checkglobals(text = 'methods::getMethod("f", signature = "ANY")')

Package loading

## this works (simple evaluation of package names)
checkglobals(text = 'attachNamespace("utils"); head(1:10)')
checkglobals(text = 'pkg <- "utils"; library(pkg, character.only = TRUE); head(1:10)')

## this doesn't work (more complex evaluation is required)
checkglobals(text = 'pkg <- function() "utils"; library(pkg(), character.only = TRUE); head(1:10)')
checkglobals(text = 'loadPkg <- library; loadPkg(utils)')
checkglobals(text = 'box::use(utils[...])')

Unknown symbols

## this works (special functions self, private, super are recognized)
checkglobals(text = 'R6::R6Class("cl",
                   public = list(
                     initialize = function(...) self$f(...),
                     f = function(...) private$p
                   ),
                   private = list(
                     p = list()
                   ))')

## this doesn't work (data masking)
checkglobals(text = 'transform(mtcars, mpg2 = mpg^2)')
checkglobals(text = 'attach(iris); print(Sepal.Width)')

Lazy evaluation

## this works (basic lazy evaluation)
checkglobals(text = '{
    addy <- function(y) x + y 
    x <- 0
    addy(1)
}')
checkglobals(
  text = 'function() { 
    on.exit(rm(x))
    x <- 0 
}')

## this doesn't work (lazy evaluation in external functions)
checkglobals(
  text = 'server <- function(input, output) {
    add1x <- shiny::reactive({
      add1(input$x)
    })
    add1 <- function(x) x + 1  
  }')


Useful references

  • checkglobals, CRAN webpage of the checkglobals package including links to additional documentation.
  • codetools::findGlobals(), detects global variables from R-scripts via static code analysis. This and other codetools functions are used in the source code checks run by R CMD check.
  • globals, R-package by H. Bengtsson providing a re-implementation of the functions in codetools to identify global variables using various strategies for export in parallel computations.
  • renv::dependencies(), detects R-package dependencies by scanning all R-files in a project for imported functions or packages via static code analysis.
  • lintr, R-package by J. Hester and others to perform general static code analysis in R projects. lintr::object_usage_linter() provides a wrapper of codetools::checkUsage() to detect global variables similar to R CMD check.
  1. Unit testing with R CMD check does not require the use of external packages, but many package developers rely on packages such as testthat or tinytest for convenience and due to common practice.