LLM-assisted epidemiological narrative • epiviz

Introduction

The llm_interpret() function integrates Large Language Model (LLM) capabilities to automatically interpret epidemiological visualisations and data. This powerful feature can generate insights, identify patterns, and provide contextual analysis of your charts and datasets, making it valuable for surveillance reporting and data exploration.

Prerequisites

library(epiviz)
library(dplyr)
library(lubridate)

Environment Setup

Before using llm_interpret(), set provider-agnostic environment variables. These are what the function reads.

Required Environment Variables

Set these in your .Renviron file or R session:

# Provider: one of "openai", "gemini", or "anthropic"
Sys.setenv(LLM_PROVIDER = "openai")

# API key for the chosen provider
Sys.setenv(LLM_API_KEY = "your-api-key")

# Model for the chosen provider (examples)
Sys.setenv(LLM_MODEL = "gpt-4.1-nano")        # OpenAI
# Sys.setenv(LLM_MODEL = "gemini-2.5-flash-lite")  # Google
# Sys.setenv(LLM_MODEL = "claude-sonnet-4-20250514")          # Anthropic

Verify Setup

if (Sys.getenv("LLM_PROVIDER") != "" && Sys.getenv("LLM_API_KEY") != "" && Sys.getenv("LLM_MODEL") != "") {
  cat("LLM environment is configured\n")
} else {
  cat("Please set LLM_PROVIDER, LLM_API_KEY and LLM_MODEL\n")
}

Example 1: Interpreting an epidemic curve

This example demonstrates how to use LLM interpretation to analy#sze an epidemic curve and generate insights about temporal patterns.

Prepare the data and create a visualization

# Create an epidemic curve for interpretation
epi_data <- epiviz::lab_data %>%
  filter(
    organism_species_name == "STAPHYLOCOCCUS AUREUS",
    specimen_date >= as.Date("2023-01-01"),
    specimen_date <= as.Date("2023-12-31")
  )

# Create the epidemic curve
epi_curve_plot <- epi_curve(
  dynamic = FALSE,
  params = list(
    df = epi_data,
    date_var = "specimen_date",
    date_start = "2023-01-01",
    date_end = "2023-12-31",
    time_period = "year_month",
    fill_colours = "#007C91",
    chart_title = "Monthly Staph aureus detections (2023)",
    x_axis_title = "Month",
    y_axis_title = "Number of detections"
  )
)

# Display the plot
print(epi_curve_plot)

Interpret the visualisation

# Use LLM to interpret the epidemic curve
interpretation <- llm_interpret(
  input = epi_curve_plot,
  word_limit = 120,
  prompt_extension = "Analyse this epidemic curve and identify notable patterns, trends, or anomalies. Focus on seasonal patterns and potential outbreak periods."
)

# Display the interpretation
cat("LLM Interpretation:\n")
cat(interpretation)

Interpretation: The LLM will analyse the epidemic curve and provide insights about temporal patterns, seasonal trends, and any notable peaks or anomalies in the data.

Example 2: Custom interpretation with specific focus

This example shows how to provide a custom prompt to guide the LLM’s analysis toward specific epidemiological concerns.

Prepare the data and create a visualization

# Create an age-sex pyramid for interpretation
pyramid_data <- epiviz::lab_data %>%
  filter(
    organism_species_name == "KLEBSIELLA PNEUMONIAE",
    specimen_date >= as.Date("2023-01-01"),
    specimen_date <= as.Date("2023-06-30")
  )

# Create the age-sex pyramid
pyramid_plot <- age_sex_pyramid(
  dynamic = FALSE,
  params = list(
    df = pyramid_data,
    var_map = list(dob_var = "date_of_birth", sex_var = "sex"),
    grouped = FALSE,
    mf_colours = c("#440154", "#2196F3"),
    x_axis_title = "Number of cases",
    y_axis_title = "Age group (years)",
    legend_title = "Klebsiella pneumoniae cases by age and sex (H1 2023)"
  )
)

# Display the plot
print(pyramid_plot)

Interpret with custom epidemiological focus

# Use LLM with a custom prompt focused on public health implications
custom_interpretation <- llm_interpret(
  input = pyramid_plot,
  word_limit = 150,
  prompt_extension = "As a public health epidemiologist, analyze this age-sex pyramid for Klebsiella pneumoniae cases. Identify which demographic groups are most at risk, discuss potential risk factors, and suggest targeted prevention strategies. Consider healthcare-associated infections and community transmission patterns."
)

# Display the custom interpretation
cat("Custom Epidemiological Analysis:\n")
cat(custom_interpretation)

Interpretation: The LLM will provide a detailed epidemiological analysis focusing on risk groups, potential transmission patterns, and public health recommendations based on the demographic distribution shown in the pyramid.

Tips for LLM interpretation

API Key Management:
- Store API keys securely in your .Renviron file
- Never commit API keys to version control
- Use different keys for different environments (development, production)
Prompt Engineering:
- Be specific about what you want the LLM to focus on
- Include context about the epidemiological scenario
- Ask for actionable insights and recommendations
- Specify the level of detail you need
Data Privacy:
- Be cautious with sensitive health data
- Consider using aggregated or anonymised data for LLM interpretation
- Review your organisation’s data sharing policies
Cost Management:
- LLM API calls have costs based on usage
- Start with smaller datasets for testing
Quality Control:
- Always review LLM interpretations for accuracy
- Cross-reference with domain expertise
- Use LLM insights as a starting point for further analysis
Integration with Workflows:
- Use LLM interpretation for automated report generation
- Incorporate into surveillance dashboards
- Generate insights for outbreak investigation

Troubleshooting

Common Issues

API Key Not Found: Ensure your environment variables are properly set and accessible to R.
Rate Limiting: If you hit rate limits, implement delays between requests or use different API keys.
Model Not Available: Check that your specified model is available in your API plan.