UKHSA statistics production hub

GitHub icon

Reproducible Analytical Pipelines (RAP): an introduction

Main messages

UKHSA colleagues are encouraged to join our internal RAP network to seek support and share knowledge. To do so, or for any other queries related to RAP, please contact UKHSA.RAP@ukhsa.gov.uk.

This page is based upon a range of sources from across government and beyond. For a reading list, please see the sources section towards the bottom of this page.

For a more detailed description of the skills and techniques involved in RAP, please see the bronze-silver-gold framework page.

The principles of RAP

Good analysis should be reproducible, transparent, trustworthy, auditable, efficient, and high quality.

The principles of RAP draw upon best practices from the field of software engineering to achieve these aims. They avoid the pitfalls associated with more manual processes reliant on opaque “point-and-click” operations which can be inefficient, difficult to reproduce, and difficult to properly quality assure or audit.

Having a transparent and reproducible analytical pipeline allows us to show that we have done what we have said we have done. It allows our users and other analysts to follow and reproduce the process and understand the results. Good peer review and audit also rely on reproducibility, which further promotes the quality of, and trust in, our analytical outputs (trustworthiness and quality are two of the main pillars of the Code of Practice for Statistics).

Despite having clear benefits, some analysts still rely on those legacy processes which are prone to error and are difficult to reproduce. There are more than 800 accredited official statistics across Government, and many more official statistics and other analytical publications and pipelines, meaning the potential benefits that RAP can bring are huge.

The aim for UKHSA, and across Government, is to work towards “RAP by default” for all analysis. The approach UKHSA intends to take to help support this is described in our RAP implementation plan.

Issues with legacy production processes

Legacy analytical pipelines tend to look like the one presented in Figure 1. Here, data may be manually extracted from some central database, which might be done by another team, and saved on a local or network drive as a CSV file. Proprietary software such as SPSS or Excel might then be used to process that data, relying on manual “point-and-click” operations. Charts and tables might be separately produced, which are then copied and pasted into a word document. Numbers reported within the text of the report are likely to be manually updated. That document is then usually emailed around for quality assurance checks and returned for revisions. The word document might then be converted to markdown (or saved as a pdf), ready for publication.

Figure 1: Legacy processes tend to have limited quality assurance and several manual steps

An image showing a largely manual process of data extraction and analysis, with manual production of outputs and quality assurance focussing on outputs rather than earlier steps.

There are a number of potential issues and sources of error with this kind of pipeline, such as:

The benefits of RAP

By implementing a reproducible analytical pipeline, these issues can be overcome.

Figure 2 shows that when using RAP, open source software such as R or Python is used to automate the end-to-end process from data extraction to analysis, through to the automated production of a variety of possible outputs (particularly markdown for HTML publications, and spreadsheets, but other formats are also possible such as slidedecks and interactive dashboards). In a fully implemented RAP process, there are no manual point-and-click operations. Because the analysis is fully documented through programming code, quality assurance and version control can be more easily integrated into all stages of the pipeline.

Figure 2: RAP projects are automated and quality assurance and version control is integrated

An image showing a more automated pipeline where quality assurance and version control is integrated throughout.

The benefits of RAP are well documented and a number of case studies lauding the successes of RAP across government have also been published (for example, by the Government Analysis Function).

Compared with legacy pipelines, RAP:

RAP does require upskilling to learn the programming skills required. However, that investment is certainly worth it to bring out the benefits just described, and the more general benefits to the analysts themselves that developing these skills will bring.

RAP also does require more time at the start of an analytical project to write the necessary code, compared to legacy approaches. However, this creates time savings in the longer term, particularly when analyses have to be repeated. This is demonstrated in Figure 3: while in a legacy pipeline the initial analysis might be quicker than for RAP, it takes the same amount of time each time the analysis is repeated (for example, for each subsequent iteration of the report, or following a data re-submission), meaning the overall project is a lengthy one. While the initial analysis step is longer for RAP, each subsequent running of the code can take a fraction of the time, meaning time is saved overall. The potential overall time savings for regular reports are significant here; the creation of re-usable snippets of code in functions and packages can also help speed up other projects as well.

Figure 3: Reproducible practices makes analysis faster

An diagram showing that although RAP might take more time than more manual approaches initially, it can save time in the long run, particularly when the pipeline needs to be repeated several times.

Note: Figure taken from the Government Analysis Function Reproducible Analytical Pipelines (RAP) strategy

Standards of RAP

We have developed bronze, silver and gold standards of RAP to provide major benchmarks for analytical teams to work towards, based upon the “minimum” and “further” standards of RAP developed by the Government Analysis Function.

We would encourage all analysts to work towards these standards (or equivalent). However, it is important to stress that RAP does not need to be an all or nothing exercise. Even implementing just some of these principles will bring about improvements to processes and outputs. Trying to achieve all of these standards in one go may be too daunting a task and so incremental improvements are the suggested way to go.

UKHSA colleagues are also encouraged to join our RAP network to seek more support from our helpful community, and to share best practice. Please contact UKHSA.RAP@ukhsa.gov.uk for more information.

Barriers to implementing RAP and overcoming them

We are fully aware that achieving these standards is easier said than done. There are a number of common barriers that analysts face in the implementation of RAP. The Government Analysis function discusses 3 areas in particular, which are outlined in the sub-sections that follow, along with ideas for overcoming them.

UKHSA’s ongoing plans in these areas can also be found in our RAP implementation plan.

If you are a UKHSA colleague and would like further support in overcoming these barriers, please contact UKHSA.RAP@ukhsa.gov.uk.

Barrier 1: Getting the right tools

In essence, the only tools required to implement RAP is a programming language and a version control system. These include software such as R or Python, and Git. These are all open source and do not require licenses.

These tools should be made available to all analysts as standard, but we know that barriers sometimes exist. You should be persistent in securing access to the tools you need as they are essential for meeting these standards, and for high quality analysis.

Barrier 2: Getting the right capability

For some, implementing RAP will require the development of new skills. Barriers may exist here in terms of knowing which skills need to be learnt and how, as well as having the time to learn and practise those skills. It is hoped that this guidance document, along with the associated bronze-silver-gold framework, will help highlight areas for development. A cultural change (see the next section) may be needed to allow time for this.

Programming and version control skills are considered essential for modern statistical analysis, so time should be given to develop these first. Starting with those skills will also provide a strong foundation for supporting you in the rest of your RAP journey. It may help as a team to practise on a single project (or even part of a project) first by developing a prototype to gain experience prior to rolling out RAP to other projects.

Note that these skills can and should be built incrementally; they do not all need to be built at once and you will not become an expert overnight. You also do not need to do this alone; copy other people and seek support! UKHSA colleagues can get support from our RAP community at UKHSA.RAP@ukhsa.gov.uk. There are also a large number of resources and training materials on R, Python and Git online (for example, on stackoverflow.com).

Barrier 3: The right culture

New tools and skills may be needed to implement RAP. Senior leaders should acknowledge and support this, and analysts should make efforts to pursue them too. It is important that you take ownership of RAP within your own team, drawing upon support from others as needed.

There may be some resistance to start dedicating time to RAP. While time does need to be dedicated to upskilling, that time will eventually be more than made back as a result of that learning (as shown in Figure 3 above). Additionally, upskilling in this area is a great opportunity to develop and further yourself professionally.

When thinking about RAP, there is often a sole focus on automation and efficiency, which is certainly a benefit, but that can downplay other potential benefits such as the opportunity for greater quality assurance and transparency, and collaborations and innovation, which are also important. Often, ad-hoc analyses are not seen as candidates for RAP, but they can enjoy the same benefits, particularly when they are later repeated (it is often not known at the start that they will be). Having a more rounded understanding of the benefits of RAP can help better promote its use.

Sources

  1. Coding in Analysis and Research Survey
  2. Government Analysis Function: Benefits to government from Reproducible Analytical Pipelines
  3. Government Analysis Function: Government Functional Standard GovS 010 Analysis
  4. Government Analysis Function: Infrastructure for Reproducible Analytical Pipelines (RAP)
  5. Government Analysis Function: Reproducible Analytical Pipelines (RAP)
  6. Government Analysis Function: Reproducible Analytical Pipelines (RAP) case studies
  7. Government Analysis Function: Reproducible Analytical Pipelines (RAP) strategy
  8. Government Analysis Function: Why take a more sophisticated approach to building your pipeline
  9. NHS Digital: RAP community of practice
  10. NHS England: Health RAP playbook
  11. NHS National Services Scotland: Reproducible Analytical Pipelines
  12. Office for National Statistics: The Duck Book - Quality assurance of code for analysis and research
  13. Office for National Statistics: Using Reproducible Analytical Pipelines (RAP) to improve statistics
  14. Reproducible Analytical Pipelines: Overcoming barriers to adoption
  15. The Aqua Book: guidance on producing quality analysis for government
  16. The Goldacre Review: Better, broader, safer: using health data for research and analysis
  17. The Turing Way: Handbook to reproducible, ethical and collaborative data science: Guide for Reproducible Research
  18. UK Statistics Authority: Code of Practice for Statistics