Skip to main content

Benchmarking future.apply

·348 words·2 mins
Urtzi Enriquez-Urzelai
Author
Urtzi Enriquez-Urzelai
Physiological and evolutionary ecologist

The Three Contenders
#

  1. Standard for loop: Manual iteration (pre-allocated).
  2. lapply: The functional, sequential R standard.
  3. future_lapply: The parallelized version.

Experiment 1: The “Cheap” Task
#

In this scenario, we do something very fast: calculating the mean of 1,000 numbers.

n <- 200
data_list <- replicate(n, rnorm(1000), simplify = FALSE)

bench_cheap <- microbenchmark(
  for_loop = {
    res_for <- vector("list", n)
    for(i in 1:n) res_for[[i]] <- mean(data_list[[i]])
  },
  standard_apply = lapply(data_list, mean),
  future_apply   = future_lapply(data_list, mean),
  times = 10
)

# Generate Table
kable(summary(bench_cheap), caption = "Cheap Task Results (milliseconds)")
exprminlqmeanmedianuqmaxneval
for_loop1608.3591670.9752190.11661903.8752669.8863709.89710
standard_apply587.176595.874947.8083867.3471267.9161548.27910
future_apply43543.99246387.78584049.176599176.992102294.771133827.55510

Cheap Task Results (milliseconds)


# Generate Figure
autoplot(bench_cheap) + labs(title = "Cheap Task: Parallel Overhead is Visible")
#> Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
#>  Please use tidy evaluation idioms with `aes()`.
#>  See also `vignette("ggplot2-in-packages")` for more information.
#>  The deprecated feature was likely used in the microbenchmark package.
#>   Please report the issue at
#>   <https://github.com/joshuaulrich/microbenchmark/issues/>.
#> This warning is displayed once per session.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Experiment 2: The “Expensive” Task
#

In this scenario, we simulate “heavy” work by adding a tiny delay (Sys.sleep). This mimics complex statistical modeling or web scraping.

n_heavy <- 20
data_heavy <- replicate(n_heavy, rnorm(10), simplify = FALSE)

# A function that takes 0.1 seconds per call
heavy_func <- function(x) {
  Sys.sleep(0.1)
  mean(x)
}

bench_expensive <- microbenchmark(
  for_loop = {
    res_for <- vector("list", n_heavy)
    for(i in 1:n_heavy) res_for[[i]] <- heavy_func(data_heavy[[i]])
  },
  standard_apply = lapply(data_heavy, heavy_func),
  future_apply   = future_lapply(data_heavy, heavy_func),
  times = 2 # Low iterations because it's slow!
)

# Generate Table
kable(summary(bench_expensive), caption = "Expensive Task Results (seconds)")
exprminlqmeanmedianuqmaxneval
for_loop2010.78662010.78662014.49502014.49502018.20332018.20332
standard_apply2008.76502008.76502011.79312011.79312014.82132014.82132
future_apply279.5349279.5349292.0067292.0067304.4786304.47862

Expensive Task Results (seconds)


# Generate Figure
autoplot(bench_expensive) + labs(title = "Expensive Task: Future Wins Big")