Why Should I Care?

Quick Case Study

Reducing Pilot Report Latency by 400% Using Ultra-Parameterized Reports

At one of my jobs we ran ~monthly pilot studies. Strong performance in each pilot was key to securing business long-term. Naturally, knowing how well we were doing in the pilot as quickly as possible was top priority. The pilot report could help identify potential pain points, let everyone breathe a sigh of relief, or¹ a bit of both.

One problem: pilot reports before I arrived were redone by hand for every pilot. While some of the code from previous pilots could be copy/pasted that introduced a whole new set of issues. This ad-hoc approach meant each pilot report took almost a full work week to build.²

After I introduced ultra-parameterized reports, we were able to provide pilot reports the moment the data became available.³ Other decision-makers could react faster, and the company’s capacity to capitalize on the report grew a ton.

The Inevitablity of Reports

I believe three things⁴ are true:

1. Data practitioners need to make reports.⁵
2. Data practitioners would rather do⁶ anything other than copy and paste info
3. You shouldn’t need to do #2 to accomplish #1

“Vanilla” Parameterized Reports

Creating a parameterized report with Quarto can be a lifesaver when stakeholders change their minds.

Gone are the days of having to go in manually to change a key variable throughout the script.⁷ Instead, you can just change the parameter in the .yml chunk at the start of the document.

Let’s look at an example from an excellent blog post on parameterized reports in Quarto by Mike Mahoney.

Example Script Without Parameters

Here’s a script that gives you an incredible amount of information on elevators in New York but isn’t parameterized yet.

---
title: "Cool graphs about elevators"
author: Mike Mahoney
subtitle: "Last generated on:"
date: today
format:
  html:
    echo: false
---

```{r}
#| message: false
library(elevators)
library(ggplot2)
theme_set(theme_minimal())
```

## Speed over time

```{r}
#| message: false
#| warning: false
elevators |>
  ggplot(aes(approval_date, speed_fpm)) +
  geom_point(alpha = 0.05) +
  geom_smooth() +
  scale_y_log10()
```

## Speed versus capacity

```{r}
#| message: false
#| warning: false
elevators |>
  ggplot(aes(capacity_lbs, speed_fpm)) +
  geom_point(alpha = 0.05) +
  geom_smooth() +
  scale_y_log10()
```

## Where in the world did all my elevators go

```{r}
elevators |>
  ggplot(aes(longitude, latitude)) + 
  geom_point(alpha = 0.05) +
  coord_sf()
```

Example Script With Parameters

And here’s a parameterized script that gives you an incredible amount of information on elevators specifically in Manhattan.

The key difference here isn’t so much that we can get info on Manhattan, it’s that we can swap out “Brooklyn” for “Manhattan” in about 0.2 seconds to re-run the report. A lot less manual work than copy-pasting Brooklyn for Manhattan everywhere and hoping we didn’t miss anything!

---
title: "Cool graphs about elevators"
author: Mike Mahoney
subtitle: "Last generated on: 2022-10-25"
date: today
format: 
  html: 
    echo: false
params: 
  borough: "Manhattan"
---

```{r}
#| message: false
library(elevators)

if (!is.na(params$borough) && params$borough != "NA") {
  elevators <- elevators[elevators$borough == params$borough, ]
}
if (nrow(elevators) == 0) {
  stop("No elevators were selected. Did you misspell `borough`?")
}

library(ggplot2)
theme_set(theme_minimal())
```

## Speed over time

```{r}
#| message: false
#| warning: false
elevators |> 
  ggplot(aes(approval_date, speed_fpm)) + 
  geom_point(alpha = 0.05) + 
  geom_smooth() + 
  scale_y_log10()
```

## Speed versus capacity

```{r}
#| message: false
#| warning: false
elevators |> 
  ggplot(aes(capacity_lbs, speed_fpm)) + 
  geom_point(alpha = 0.05) + 
  geom_smooth() + 
  scale_y_log10()
```

## Where in the world did all my elevators go

```{r}
elevators |> 
  ggplot(aes(longitude, latitude)) + 
  geom_point(alpha = 0.05) + 
  coord_sf()
```

Parameterize for Peace of Mind

If there’s any chance stakeholders will change their minds about key aspects of the analysis, I think it’s worth parameterizing the report.⁸

I think folks should be allowed to change their minds, and “vanilla” parameterizing makes your reports process more robust to changing demands. They also save us time and energy during periods when timelines are tight.

Ultra-Parameterized Reports

And here’s the thing: Replacing “Manhattan” with “Brooklyn” is still manual work. I don’t think we should over-optimize processes, and if you only need to make occasional manual edits to your parameters that’s great.

But what if you have to make an elevator report for 1,000 cities around the world?

Even if manually updating each file only takes 0.2 seconds, you’d still be manually typing for an hour andd forty minutes. Plus, if you do a good enough job they might want that report every week, and all of a sudden parameterized reports feel like a drop of water on a raging fire.

We Were Promised Disguises

This is where mustache starts providing the “ultra” in “ultra-parameterized.” Using the whisker package implementation in R, we can create a version of the script that will allow us to programmatically specify the borough during report creation.

The only difference between this version and the parameterized report is replacing “Manhattan” with “{{ borough }}”

---
title: "Cool graphs about elevators"
author: Mike Mahoney
subtitle: "Last generated on: 2022-10-25"
date: today
format: 
  html: 
    echo: false
params: 
  borough: {{ borough }}
---

```{r}
#| message: false
#| results: false
library(elevators)

if (!is.na(params$borough) && params$borough != "NA") {
  elevators <- elevators[elevators$borough == params$borough, ]
}
if (nrow(elevators) == 0) {
  stop("No elevators were selected. Did you misspell `borough`?")
}

library(ggplot2)
theme_set(theme_minimal())
```

## Speed over time

```{r}
#| message: false
#| warning: false
elevators |> 
  ggplot(aes(approval_date, speed_fpm)) + 
  geom_point(alpha = 0.05) + 
  geom_smooth() + 
  scale_y_log10()
```

## Speed versus capacity

```{r}
#| message: false
#| warning: false
elevators |> 
  ggplot(aes(capacity_lbs, speed_fpm)) + 
  geom_point(alpha = 0.05) + 
  geom_smooth() + 
  scale_y_log10()
```

## Where in the world did all my elevators go

```{r}
elevators |> 
  ggplot(aes(longitude, latitude)) + 
  geom_point(alpha = 0.05) + 
  coord_sf()
```

Creating a Function to Run + Render Many Reports

We now need another file that will replace “{{ borough }}” with the appropriate boroughs and render each report.

---
title: "Create All Borough Reports"
author: "Michael Mullarkey"
---

# Load Packages

```{r}

library(whisker)    # For replacing {{ }} text
library(tidyverse)  # For function creation + data wrangling
library(glue)       # For programmatic file naming
library(lubridate)  # For today()
library(here)       # For better file paths
library(quarto)     # For rendering
library(elevators)  # For elevators data

```

# Use Whisker to Modify Template .qmd File with Desired Values

```{r}

use_borough_template <- function(borough, file_name) {
  
  raw_qmd <- readLines(file_name) # Reading in full .qmd file
  
  filled_qmd <- whisker.render(raw_qmd) # Replace {{}} with borough value 
  
  writeLines(
    text = filled_qmd,
    con = glue("{borough}_{today()}.qmd") # Programmatic naming so we don't
    # just overwrite the same file again and again when we iterate
  )
    
}

```

# Render .qmd Files Using Programmatic Names

```{r}

render_borough_template <- function(borough, file_name) {
  
  quarto_render(
    input = glue("{borough}_{today()}.qmd")
  )
  
}

```

# Put Both Functions Together So We Only Have to Make One Function Call

```{r}

create_borough_report <- function(borough, file_name) {
  
  use_borough_template(borough = borough, file_name = file_name)
  
  render_borough_template(borough = borough, file_name = file_name)
  
}

```

# Testing the report creation once before going on to programmatic

```{r}

# Testing the report creation once before going on to programmatic

# create_borough_report(borough = "Manhattan", file_name = "borough_template.qmd")

```

# Creating a dataframe of information to map over

```{r}

# Create data frame to map over for borough reports

all_boroughs <- elevators::elevators %>% 
  distinct(borough) %>% 
  deframe()

file_name_vec <- rep("borough_template.qmd", length(all_boroughs))

boroughs_report_df <- tibble(all_boroughs, file_name_vec)

```

# Map over all borough reports

```{r}

# This also could be walk because we're really just looking for side effects!

# Not necessary to do pmap for this example but it's more extensible to if you
# need to use more than 2 paramters (otherwise can use map or map2)

pmap(
  boroughs_report_df,
  ~create_borough_report(
    borough = ..1,
    file_name = ..2
  )
)

```

Once you have a file like this that can programmatically replace your parameters and render reports you’re good to go!

You can check out the programmatically generated reports for Manhattan the Bronx, Brooklyn, Queens, and Staten Island.⁹

If stakeholders have a small tweak they want to one report but not others you can make that edit just in one .qmd file and then re-render it.¹⁰

Conclusion

This is one proof-of-concept for ultra-parameterized reports, and I’ve extended parameterized reports on the job in other ways. I’ve found working with databases + conditional evaluation of chunks particularly helpful for creating bespoke seeming reports in a programmatic way.

If you’re curious about those approaches and haven’t already opened Mike Mahoney’s blog post please do that now! He discusses those extensions among others, and everything you can do with a parameterized report can be done with an ultra-parameterized report.

Also, huge shout outs to Jacqueline Nolis and Tom Mock who provided resources when I was trying to improve report building at my job. If you have questions about this approach feel free to reach out to me on Twitter!

Footnotes

most likely!↩︎
Including stakeholder back and forth on custom parts of the report↩︎
within 1 day of the pilot starting↩︎
Ok, maybe some other stuff too but I’m trying to keep the non-footnotes part of the post focused!↩︎
If you have somehow had an entire career trajectory where you haven’t had to make reports for stakeholders, congrats! Seems outside the norm to me in 2022, though who knows how well this will age↩︎
almost↩︎
Yes, this includes the risky “Find and Replace All” strategy↩︎
Yes, I definitely lean toward parameterization, maybe too much!↩︎
One note for anyone trying to replicate this process: In order to not have all the programmatically generated .qmd files show up as separate blog posts on my website, I had to delete them. I also had to delete the .qmd files for the report template and mapping through creating all the reports. You can recreate those files entirely from the code in this blog post, and I could understand someone looking in the Github repo being confused. This isn’t an issue when you’re not publishing a blog post, so in most situations you shouldn’t even have to think about this wrinkle!↩︎
I do more file path work using glue in other circumstances, and doing that wasn’t playing nice with the Quarto blog structure. Another adventure for another time!↩︎