Automated {drat} uploads with GitHub Actions

Continuously deploy your code to personal CRAN-like repos, automatically and for free!

R
Tutorials
CI
CD
GitHub Actions
Author
Published

September 23, 2021

This is a quick walkthrough of how to use GitHub actions to automatically upload packages to personal {drat} repos.

If you just want the good stuff, here’s the link to the GitHub action I’m using as well as the GitHub workflow that uploads terrainr to my own drat repo.

What are we doing?

{drat} is an R package that helps you set up CRAN-like repositories, hosted (primarily) on GitHub Pages. GitHub Actions is a continuous integration/continuous deployment service1, also hosted by GitHub, which lets you trigger compute workloads based on a CRON schedule, activity on a hosted repository, or manually.

This post walks through setting up a GitHub Actions CD workflow2 to automatically build and upload an R package to a drat repo based on a schedule or repository activity.

Why are we doing it?

My lab has a number of internal packages we use for our day-to-day work which handle things like downloading data from our central server and producing standardized model assessment outputs. All of these packages live on GitHub in our central organization, and for a long time our workflow for installing these packages has been to clone the repository and then devtools::install the package.

Recently, though, we’ve had to share one of our packages with a number of outside collaborators who we aren’t going to be adding to the organization. We’ve also started working with a number of people who are less familiar with git and GitHub, both externally and within the lab, which means we need to provide a bit more instruction than simply saying “clone the repo and install that”3. This also means that solutions like “use remotes::install_github with a PAT” aren’t feasible; we want people to be able to get the packages they need without needing to know anything about GitHub itself. Therefore, we needed a simpler method to actually distribute our code, both to people we’d happily give repo access and people we wouldn’t want to.

For public repositories and organizations, I think the easiest solution to this problem is to set up your own R Universe, which rOpenSci has written fantastic instructions for getting started with; this offloads your responsibility for managing CI to the rOpenSci team. For this project, however, we wanted a separate and mostly hidden repository page, which made {drat} a perfect candidate.

The only problem was how to make sure we were always publishing the newest versions of our package. We’re a small team doing a lot of different things, so being able to automate away any repetitive task (like publishing patch releases to a {drat} repo) can really help reduce the cognitive load and number of mistakes associated with updating our shared codebase.

How did we do it?

Because the original version of this workflow is on a private repo, I’ll walk through the workflow that uploads terrainr to my own personal drat repo. The YAML file that runs the workflow looks like this:

on:
  push:
    branches:
      - 'main'
    paths:
      - 'DESCRIPTION'
  workflow_dispatch:

jobs:
  drat-upload:
    runs-on: ubuntu-20.04
    name: Drat Upload
    steps:
      - uses: mikemahoney218/upload-to-drat-repo@v0.1
        with:
          drat_repo: 'mikemahoney218/drat'
          token: "${{ secrets.DRAT_TOKEN }}"
          commit_message: "Automated update (add terrainr)"
          commit_email: "mike.mahoney.218@gmail.com"

This script has two main components to it: running the job and configuring the action itself.

Running the job

At the top of terrainr’s drat workflow script is the following chunk:

on:
  push:
    branches:
      - 'main'
    paths:
      - 'DESCRIPTION'
  workflow_dispatch:

This sets up two different ways to trigger the workflow. First off, any push commit to the main branch that touches the DESCRIPTION file will automatically build the package and push it to my drat repo. Because terrainr is published on CRAN and has a handful of users, I do my best to update version numbers whenever I make changes – even during development, I try and bump development versions whenever there’s a fix or update worth mentioning.

Since the version number is stored in DESCRIPTION, this means I’ll only push updated package versions when I’ve actually updated something in the package; my small edits to the CI files, for instance, won’t create a new release.

If you want to be freer about updating your repo, you can drop any of these restrictions. For instance, our internal packages have workflow triggers that look like this:

on:
  push:
    branches: 'main'

In this case, we push an update any time we push any commit to the main branch. This can cause problems – for instance, if we don’t update version numbers then update.packages won’t recognize that local installations are out of date – but makes more sense for our small team and small group of users.

The second method this workflow uses to upload a new version is the workflow_dispatch: option, which lets me manually trigger the workflow from the GitHub UI. This is super helpful in case I accidentally mess up my drat repo, or make a small edit without changing the version number in the DESCRIPTION; without this line I’d have to rebuild the package on my laptop and update the drat repo from my local copy.

There’s a lot of other ways you can set to trigger workflows – see the full list here – but the other one I want to highlight is that you can also set the workflow to trigger on a schedule using a crontab. This snippet for instance will release daily builds every day at 0:00 UTC:

on:
  schedule: 
    - cron: '0 0 * * *'

You can use this syntax to create incredibly complex schedules; I personally always use https://crontab.guru/ to make sure my crontabs are right when I’m trying to schedule jobs.

Configuring the action

The second half of the workflow script actually calls the action to build the package and upload it to your drat repo:

jobs:
  drat-upload:
    runs-on: ubuntu-20.04
    name: Drat Upload
    steps:
      - uses: mikemahoney218/upload-to-drat-repo@v0.1
        with:
          drat_repo: 'mikemahoney218/drat'
          token: "${{ secrets.DRAT_TOKEN }}"
          commit_message: "Automated update (add terrainr)"
          commit_email: "mike.mahoney.218@gmail.com"

This will, by default, spin up an Ubuntu 20.04 runner and run v0.1 of the upload action, which is the current version. This action will automatically check out the repo the workflow is called from, build and install the R package in that repo, then insert it into a drat repo.

There’s a few input values for this workflow which you must specify, which are under the “with” step:

  • drat_repo: The GitHub repository for the drat repo you’re trying to push your package to. If you don’t have a drat repo yet, you can create one locally using drat::initRepo, and then push the created directory up to GitHub.
  • commit_email: The author to write the drat repo commit as; used to set git config user.email. You must provide a value here.
  • token: A personal access token (PAT). The action will use this PAT to clone your package and drat repo, as well as to push your drat repo, so make sure to authorize the PAT to do so. It’s a good idea to use a service account with the fewest permissions possible for this job.

You can also customize the job further with a few additional inputs:

  • commit_message: The message to use when committing to drat_repo.
  • commit_author: The author to write the commit as; used to set git config user.name.
  • package: The GitHub repository for the package you want to upload. Defaults to the repository the action is running in, but this can be used to run your workflow elsewhere (for instance, if you want to have all your upload jobs scheduled in the drat repo itself).

And for most use cases, this should be enough to automatically deploy your package whenever you desire! So far, I’ve tested this workflow on a handful of my own packages (terrainr and then our internal packages), and things are working well so far – if you find any problems or have any suggestions, feel free to open an issue on the action repo here.

Footnotes

  1. Kinda. It’s probably more accurate to say GitHub Actions provides hosted compute which is typically used for CI/CD workflows, but can also be applied to other things; I use it to make art and post to Twitter, for instance.↩︎

  2. Workflows are explicit, repository-specific jobs, while actions are generic scripts which can be included inside of workflows.↩︎

  3. I had completely forgotten how complex getting started with GitHub can be as a novice until recently, teaching a Carpentries workshop, I realized that teaching the initial account set-up requires a full 45 minute lesson.↩︎