# Three fun R functions

All the cool kids were jumping off the bridge!
R
Tutorials
Author
Published

October 24, 2023

Inspired by Maëlle who was inspired by Yihui who was inspired by Maëlle(who has a whole series about this), I wanted to share three useful base R functions that I think maybe don’t get enough love. And inspired by Maëlle again, my list here is actually four functions.

## `sweep()`

If you ever need to do math with matrices, then `sweep()` is going to be your best friend. Say for instance we want to center and scale each column in a matrix. This is a pretty straightforward operation – we need to calculate the mean and standard deviations for each column, subtract the column mean from each observation, and then divide those by the corresponding standard deviation.

We can use `apply` to get our means and standard deviations:

``````# Generate some fake data in a 10x10 matrix:
x <- matrix(data = rnorm(100), nrow = 10)
# Calculate one mean and sd for each column of our matrix:
col_means <- apply(x, 2, mean)
col_sds <- apply(x, 2, sd)``````

The subtraction and division are a bit less straightforward. R’s base math operators will attempt to do element-wise operations, treating our vector as a one-column array and replicating as needed. That’s not what we want:

``````all.equal(
(x - col_means) / col_sds,
scale(x)
)``````
`````` "Attributes: < Length mismatch: comparison on first 1 components >"
 "Mean relative difference: 0.360556"                               ``````

We could replicate our vector ourself, in order to take advantage of these element-wise operations:

``````all.equal(
((x - matrix(rep(col_means, 10), 10, byrow = TRUE)) /
matrix(rep(col_sds, 10), 10, byrow = TRUE)) |> as.vector(),
scale(x) |> as.vector()
)``````
`` TRUE``

But that’s silly, especially if we were working with more observations.

Better instead is to use `sweep()` to perform some operation between each element of our vector and each column of the matrix:

``````# Take every value in our matrix, and subtract its corresponding column mean:
centered <- sweep(
x = x,
MARGIN = 2, # just like in apply()
STATS = col_means,
FUN = "-" # "-" is the default argument -- we don't NEED to provide it here
)``````

And we can similarly use `sweep()` to divide each column by its corresponding standard deviation, finishing up our centering and scaling:

``````# Divide each value by its corresponding column sd:
centered_and_scaled <- sweep(centered, 2, col_sds, "/")

# Works out identically to the built-in scale function:
all.equal(
as.vector(centered_and_scaled),
as.vector(scale(x))
)``````
`` TRUE``

This is the main way I use `sweep()`, but there’s no requirement you use it for math – it works just as well with non-mathematical functions or non-numeric matrices:

``````letter_mat <- matrix(rep(letters[1:5], 5), 5)
letter_mat``````
``````     [,1] [,2] [,3] [,4] [,5]
[1,] "a"  "a"  "a"  "a"  "a"
[2,] "b"  "b"  "b"  "b"  "b"
[3,] "c"  "c"  "c"  "c"  "c"
[4,] "d"  "d"  "d"  "d"  "d"
[5,] "e"  "e"  "e"  "e"  "e" ``````
``sweep(letter_mat, 2, LETTERS[1:5], paste0)``
``````  "aA" "bA" "cA" "dA" "eA" "aB" "bB" "cB" "dB" "eB" "aC" "bC" "cC" "dC" "eC"
 "aD" "bD" "cD" "dD" "eD" "aE" "bE" "cE" "dE" "eE"``````

## `reformulate()` and `DF2formula()`

The `reformulate()` function is a lifesaver if you’re trying to write long or complicated formulas, or multiple formulas generated by some other logic in your code.

The function is pretty straightforward. If you’re trying to make a formula `y ~ x + z`, provide your predictors as the first argument and your outcome as the second:

``reformulate(c("x", "z"), "y")``
``y ~ x + z``

The nice thing is that `reformulate` accepts vectors as inputs, making it easy to construct a vector of predictors and automatically turn them into a formula:

``reformulate(letters, "outcome")``
``````outcome ~ a + b + c + d + e + f + g + h + i + j + k + l + m +
n + o + p + q + r + s + t + u + v + w + x + y + z``````
``reformulate(names(Orange), "age")``
``age ~ Tree + age + circumference``

And in particular, this is an excellent alternative to dropping a few columns in order to use `outcome ~ .` – instead, you can use `setdiff()` to exclude those columns from your formula:

``````outcome_variable <- "age"
reformulate(setdiff(names(Orange), outcome_variable), outcome_variable)``````
``age ~ Tree + circumference``

Relatedly, the function `DF2formula()` will automatically turn the column names from a data frame into a formula. The first column will become the outcome variable, and the rest will be used as predictors:

``DF2formula(Orange)``
``Tree ~ age + circumference``

To change what column is used as the outcome variable, reorder the columns in your data frame:

``DF2formula(Orange[3:1])``
``circumference ~ age + Tree``

## `str2lang()`

Shockingly enough, `str2lang()` function turns a string into a language object:

``````growth_rate <- "circumference / age"
str2lang(growth_rate)``````
``circumference/age``
``class(str2lang(growth_rate))``
`` "call"``

Wooooo!

I think that, to most people, this does not sound immediately useful.1 But the idea that your code can turn plain text into code at runtime is pretty powerful, and some of the most R-esque nonsense that R has to offer.

For instance, we can use `eval()` to actually execute the call created by `str2lang()` in our global environment:

``eval(str2lang("2 + 2"))``
`` 4``

And that string can do anything that regular R code can do – assign variables, manage connections, any procedure that normal R code can do:

``````eval(str2lang("x <- 3"))
x``````
`` 3``

We can also use this with `with()` or `local()` to execute our code inside of other environments. For instance, if we want to calculate our `growth_rate` from earlier, we can run that code with the `Orange` data frame:

``with(Orange, eval(str2lang(growth_rate)))``
``````  0.25423729 0.11983471 0.13102410 0.11454183 0.09748172 0.10349854
 0.09165613 0.27966102 0.14256198 0.16716867 0.15537849 0.13972380
 0.14795918 0.12831858 0.25423729 0.10537190 0.11295181 0.10756972
 0.09341998 0.10131195 0.08849558 0.27118644 0.12809917 0.16867470
 0.16633466 0.14541024 0.15233236 0.13527181 0.25423729 0.10123967
 0.12198795 0.12450199 0.11535337 0.12682216 0.11188369``````

This can be a powerful way to “import” code from other sources, for instance if you have a CSV of equations you want to run against a data frame. You want to be careful when using this with untrusted inputs, of course – if your input includes a call to `system()`, it might wind up wrecking your computer!

## Footnotes

1. I think, to most people, this barely sounds like English.↩︎