Calculate Cohort Reference Percentiles — cohort

Calculates reference percentiles for a metric across a cohort of athletes, stratified by specified grouping variables (e.g., sport, sex, age band).

Usage

cohort_reference(
  data,
  metric = "acwr_smooth",
  by = c("sport"),
  probs = c(0.05, 0.25, 0.5, 0.75, 0.95),
  min_athletes = 5,
  date_col = "date"
)

Arguments

data: A data frame containing metric values for multiple athletes. Must include columns: date, athlete_id, and the metric column.
metric: Name of the metric column to calculate percentiles for (e.g., "acwr", "acwr_smooth", "ef", "decoupling"). Default "acwr_smooth".
by: Character vector of grouping variables. Options: "sport", "sex", "age_band", "athlete_id". Default c("sport").
probs: Numeric vector of probabilities for percentiles (0-1). Default c(0.05, 0.25, 0.50, 0.75, 0.95) for 5th, 25th, 50th, 75th, 95th percentiles.
min_athletes: Minimum number of athletes required per group to calculate valid percentiles. Default 5.
date_col: Name of the date column. Default "date".

Value

A long-format data frame with columns:

date: Date
...: Grouping variables (as specified in by)
percentile: Percentile label (e.g., "p05", "p25", "p50", "p75", "p95")
value: Metric value at that percentile
n_athletes: Number of athletes contributing to this percentile

Details

This function creates cohort-level reference bands for comparing individual athlete metrics to their peers. Common use cases:

Compare an athlete's ACWR trend to team averages
Identify outliers (athletes outside P5-P95 range)
Track team-wide trends over time

Important: Percentile bands represent population variability, not statistical confidence intervals for individual values.

Examples

if (FALSE) { # \dontrun{
# Load activities for multiple athletes
athlete1 <- load_local_activities("athlete1_export.zip") %>%
  mutate(athlete_id = "athlete1")
athlete2 <- load_local_activities("athlete2_export.zip") %>%
  mutate(athlete_id = "athlete2")
athlete3 <- load_local_activities("athlete3_export.zip") %>%
  mutate(athlete_id = "athlete3")

# Combine data
cohort_data <- bind_rows(athlete1, athlete2, athlete3)

# Calculate ACWR for each athlete
cohort_acwr <- cohort_data %>%
  group_by(athlete_id) %>%
  group_modify(~calculate_acwr_ewma(.x))

# Calculate reference percentiles
reference <- cohort_reference(
  cohort_acwr,
  metric = "acwr_smooth",
  by = c("sport"),
  probs = c(0.05, 0.25, 0.5, 0.75, 0.95)
)

# Plot individual against cohort
plot_with_reference(
  individual = cohort_acwr %>% filter(athlete_id == "athlete1"),
  reference = reference
)
} # }