UKHSA statistics production hub

GitHub icon

Reproducible Analytical Pipelines (RAP): bronze-silver-gold framework

This page is based upon a range of sources from across government and beyond. For a reading list, please see the sources section towards the bottom of this page.

For an introduction on what RAP is, and what benefits it can bring, please see the introduction to RAP page.

How to use this framework

The framework outlined on this page includes a number of standards to work towards when developing and improving upon a Reproducible Analytical Pipeline (RAP). We present bronze, silver and gold standards, offering major benchmarks to work towards, with each subsequent standard reflecting a more transparent, reproducible and robust pipeline.

You do not need to aim for gold for every project; instead, the standard you should aim for depends on the needs of the project.

These standards are largely based upon the “minimum” and “further” standards of RAP developed by the Government Analysis Function. In the appendix section, we explain how our standards differ from theirs. Our standards also align with those presented within the Health RAP playbook.

While we would encourage all analysts to eventually meet one of these standards in their projects, it is important to stress that RAP does not need to be an all or nothing exercise. Even implementing just some of the principles outlined here will bring about improvements to processes and outputs. Trying to achieve all of these standards in one go may be too daunting a task and so incremental improvements are the suggested way to go.

It is also important to recognise that some of principles that form these standards may not always be possible. For example, where remote connections to databases are not permitted, this cannot be automated (which is one of the principles in our bronze standard). In such cases, the aim should be to apply RAP to the other areas where principles can be applied. (Note though that it is preferred to have a direct connection to databases wherever possible, as any “black-box” or manual processes in extracting and preparing data for analysis might undermine some of the benefits of RAP.)

UKHSA colleagues can contact UKHSA.RAP@ukhsa.gov.uk for more support on how to meet these standards, including for information on how to join our helpful RAP community.

Overview

The following table gives a quick overview of the principles that form each standard. More detailed explanations are given in the sub-sections to follow.

The Bronze standard 🥉 The Silver standard 🥈 The Gold standard 🥇
Use open source analytical software Meet the Bronze standard Meet the Bronze and Silver standards
Have minimal manual steps for data extraction and analysis Have minimal manual steps for the production of outputs Have unit testing for functions
Follow good practice for quality assurance, integrating quality assurance checks throughout the pipeline Use functions as reusable blocks of code Have error handling for functions
Have well-commented code and project documentation Adhere to a common code style Include documentation for functions
Make code available to other analysts Have automated input data validation Use packaging
Use version control - Log data and analysis checks
Use peer review to ensure reproducibility (checking whether pipelines meet these standards) - Implement continuous integration
- - Implement dependency management

The Bronze standard 🥉

To meet the bronze standard for RAP, your project should :

See inside the expandable sections below for more detailed guidance.

The Silver standard 🥈

To meet the silver standard for RAP, your project should:

See inside the expandable sections below for more detailed guidance.

The Gold standard 🥇

To meet the gold standard for RAP, your project should:

See inside the expandable sections below for more detailed guidance.

Sources

  1. Government Analysis Function: Reproducible Analytical Pipelines (RAP) strategy
  2. Government Analysis Function: Reproducible Analytical Pipelines (RAP)
  3. Government Analysis Function: Why take a more sophisticated approach to building your pipeline
  4. NHS Digital: RAP community of practice
  5. NHS England: Health RAP playbook
  6. NHS National Services Scotland: Reproducible Analytical Pipelines
  7. Office for National Statistics: The Duck Book - Quality assurance of code for analysis and research
  8. Office for Statistics Regulation: Reproducible Analytical Pipelines - Overcoming barriers to adoption
  9. The Aqua Book: guidance on producing quality analysis for government
  10. The Goldacre Review: Better, broader, safer: using health data for research and analysis
  11. The Turing Way: Handbook to reproducible, ethical and collaborative data science - Guide for Reproducible Research
  12. UK Government Data Science: RAP Companion
  13. UK Statistics Authority: Code of Practice for Statistics

Appendix

Our standards versus those produced by the Government Analysis Function

Our guidance draws upon the principles developed by the Government Analysis Function in their RAP strategy (supported by other sources listed above). However, we have separated them out into 3 standards (“bronze”, “silver” and “gold”) instead of the Analysis Function’s 2 (“minimum” and “further”). This is to make it easier for teams to progress through the standards in smaller leaps.

In our silver standard, some principles have come from the Analysis Function’s minimum standard, and others from their further standards. Specifically:

Aside from these differences, all other Analysis function “minimum” standard principles are in our bronze standard, and all other Analysis Function “further” standard principles are in our gold standard.