Annotate top GWAS hits with Open Targets Locus-to-Gene (L2G) predictions

Queries the Open Targets Platform to retrieve L2G scores and adds `l2g_gene_id`, `l2g_gene_name`, and `l2g_score` columns.

Usage

annotate_with_l2g(x, id_col = "ID", ...)

Arguments

x: A data.frame or tibble of GWAS hits.
id_col: Name of the column containing variant IDs. Default `"ID"`.
...: Additional arguments (unused).
method: `"api"` (per-variant GraphQL, default) or `"bulk"` (parquet).
release: Open Targets Platform release string, e.g. `"25.09"`. `NULL` (default) auto-detects the latest release. Only used when `method = "bulk"`.
cache_dir: Directory for cached parquet files. Defaults to the platform user-data directory for gwasplot. Only used when `method = "bulk"`.
ask: If `TRUE` (default when running interactively), prompt the user before downloading data. Set `FALSE` to download without prompting (e.g. in scripts). Only used when `method = "bulk"` and data is not cached.

Value

The input data.frame with three new columns appended:

l2g_gene_id: Ensembl gene ID of the top L2G gene
l2g_gene_name: Approved gene symbol
l2g_score: L2G score 0–1 (higher = more likely causal)

Variants with no L2G predictions receive `NA` in these columns.

Details

Two methods are available: * `"api"` (default) — queries the GraphQL API once per variant. Fast for small sets (<100 variants), but slow for larger ones (~30 min for 1000 variants). * `"bulk"` — uses DuckDB to query the Open Targets Platform parquet files (FTP). Downloads ~700 MB of L2G predictions on first use, then caches them permanently. Subsequent queries for any number of variants finish in seconds.

The typical workflow for large sets: “`r hits <- select_top_hits(gwas_obj) hits <- find_nearest_gene(hits) hits <- annotate_with_l2g(hits, method = "bulk") “`