Reproducible Analytical Pipelines (RAP): bronze-silver-gold framework
This page is based upon a range of sources from across government and beyond. For a reading list, please see the sources section towards the bottom of this page.
For an introduction on what RAP is, and what benefits it can bring, please see the introduction to RAP page.
How to use this framework
The framework outlined on this page includes a number of standards to work towards when developing and improving upon a Reproducible Analytical Pipeline (RAP). We present bronze, silver and gold standards, offering major benchmarks to work towards, with each subsequent standard reflecting a more transparent, reproducible and robust pipeline.
You do not need to aim for gold for every project; instead, the standard you should aim for depends on the needs of the project.
These standards are largely based upon the “minimum” and “further” standards of RAP developed by the Government Analysis Function. In the appendix section, we explain how our standards differ from theirs. Our standards also align with those presented within the Health RAP playbook.
While we would encourage all analysts to eventually meet one of these standards in their projects, it is important to stress that RAP does not need to be an all or nothing exercise. Even implementing just some of the principles outlined here will bring about improvements to processes and outputs. Trying to achieve all of these standards in one go may be too daunting a task and so incremental improvements are the suggested way to go.
It is also important to recognise that some of principles that form these standards may not always be possible. For example, where remote connections to databases are not permitted, this cannot be automated (which is one of the principles in our bronze standard). In such cases, the aim should be to apply RAP to the other areas where principles can be applied. (Note though that it is preferred to have a direct connection to databases wherever possible, as any “black-box” or manual processes in extracting and preparing data for analysis might undermine some of the benefits of RAP.)
UKHSA colleagues can contact UKHSA.RAP@ukhsa.gov.uk for more support on how to meet these standards, including for information on how to join our helpful RAP community.
Overview
The following table gives a quick overview of the principles that form each standard. More detailed explanations are given in the sub-sections to follow.
The Bronze standard 🥉 | The Silver standard 🥈 | The Gold standard 🥇 |
---|---|---|
Use open source analytical software | Meet the Bronze standard | Meet the Bronze and Silver standards |
Have minimal manual steps for data extraction and analysis | Have minimal manual steps for the production of outputs | Have unit testing for functions |
Follow good practice for quality assurance, integrating quality assurance checks throughout the pipeline | Use functions as reusable blocks of code | Have error handling for functions |
Have well-commented code and project documentation | Adhere to a common code style | Include documentation for functions |
Make code available to other analysts | Have automated input data validation | Use packaging |
Use version control | - | Log data and analysis checks |
Use peer review to ensure reproducibility (checking whether pipelines meet these standards) | - | Implement continuous integration |
- | - | Implement dependency management |
The Bronze standard 🥉
To meet the bronze standard for RAP, your project should :
- use open source analytical software (preferably R or Python)
- have minimal manual steps (for example, minimal copy-paste, point-click, drag-drop operations) for data extraction where permissions allow (for example, using SQL code), and for the analysis steps used to produce numbers, tables and charts
- follow good practice in quality assurance, integrate quality assurance checks throughout the analysis, automated where appropriate, supplemented with semi-automated and manual checks (see our other guidance on quality assurance)
- have well-commented code and documentation embedded as part of the project, rather than being saved elsewhere
- be open and available to other analysts (including external users where appropriate) on a shared drive or on GitHub or GitLab, rather than a personal storage area
- use version control software such as Git and GitHub or GitLab to create and maintain a recorded history of the project, and version control your input data
- use peer review to ensure that the pipeline meets the rest of this standard
See inside the expandable sections below for more detailed guidance.
The Silver standard 🥈
To meet the silver standard for RAP, your project should:
- achieve all of the principles included in the bronze standard
- have minimal manual steps (for example, minimal copy-paste, point-click, drag-drop operations) for the production of outputs such as reports, spreadsheets, interactive dashboards and others
- use functions as reusable blocks of code
- adhere to a common best practice code style
- have automated input data validation
See inside the expandable sections below for more detailed guidance.
The Gold standard 🥇
To meet the gold standard for RAP, your project should:
- achieve all of the principles included in the bronze and silver standards
- have unit testing for functions
- have error handling for functions
- include documentation of functions (usually included as part of a package)
- use packaging
- log data and analysis checks
- implement continuous integration
- implement dependency management
See inside the expandable sections below for more detailed guidance.
Sources
- Government Analysis Function: Reproducible Analytical Pipelines (RAP) strategy
- Government Analysis Function: Reproducible Analytical Pipelines (RAP)
- Government Analysis Function: Why take a more sophisticated approach to building your pipeline
- NHS Digital: RAP community of practice
- NHS England: Health RAP playbook
- NHS National Services Scotland: Reproducible Analytical Pipelines
- Office for National Statistics: The Duck Book - Quality assurance of code for analysis and research
- Office for Statistics Regulation: Reproducible Analytical Pipelines - Overcoming barriers to adoption
- The Aqua Book: guidance on producing quality analysis for government
- The Goldacre Review: Better, broader, safer: using health data for research and analysis
- The Turing Way: Handbook to reproducible, ethical and collaborative data science - Guide for Reproducible Research
- UK Government Data Science: RAP Companion
- UK Statistics Authority: Code of Practice for Statistics
Appendix
Our standards versus those produced by the Government Analysis Function
Our guidance draws upon the principles developed by the Government Analysis Function in their RAP strategy (supported by other sources listed above). However, we have separated them out into 3 standards (“bronze”, “silver” and “gold”) instead of the Analysis Function’s 2 (“minimum” and “further”). This is to make it easier for teams to progress through the standards in smaller leaps.
In our silver standard, some principles have come from the Analysis Function’s minimum standard, and others from their further standards. Specifically:
- “minimise manual steps” has been separated out into two principles covering manual steps during analysis and manual steps during the production of outputs, with the former remaining in the minimum bronze standard and the latter being moved to our silver standard; this is to recognise the fact that many analysts will develop the more basic programming skills needed for analysis first, before developing other skills around Rmarkdown, for example
- “validating input data”, “using functions”, and “adhere to a common best practice code style” have all been moved to our silver standard from the further standard because we have found that these are usually easier to implement than the other further standard principles
Aside from these differences, all other Analysis function “minimum” standard principles are in our bronze standard, and all other Analysis Function “further” standard principles are in our gold standard.