Annotate top GWAS hits with Open Targets Locus-to-Gene (L2G) predictions
Source:R/annotate.R
annotate_with_l2g.RdQueries the Open Targets Platform to retrieve L2G scores and adds `l2g_gene_id`, `l2g_gene_name`, and `l2g_score` columns.
Arguments
- x
A data.frame or tibble of GWAS hits.
- id_col
Name of the column containing variant IDs. Default `"ID"`.
- ...
Additional arguments (unused).
- method
`"api"` (per-variant GraphQL, default) or `"bulk"` (parquet).
- release
Open Targets Platform release string, e.g. `"25.09"`. `NULL` (default) auto-detects the latest release. Only used when `method = "bulk"`.
- cache_dir
Directory for cached parquet files. Defaults to the platform user-data directory for gwasplot. Only used when `method = "bulk"`.
- ask
If `TRUE` (default when running interactively), prompt the user before downloading data. Set `FALSE` to download without prompting (e.g. in scripts). Only used when `method = "bulk"` and data is not cached.
Value
The input data.frame with three new columns appended:
- l2g_gene_id
Ensembl gene ID of the top L2G gene
- l2g_gene_name
Approved gene symbol
- l2g_score
L2G score 0–1 (higher = more likely causal)
Variants with no L2G predictions receive `NA` in these columns.
Details
Two methods are available: * `"api"` (default) — queries the GraphQL API once per variant. Fast for small sets (<100 variants), but slow for larger ones (~30 min for 1000 variants). * `"bulk"` — uses DuckDB to query the Open Targets Platform parquet files (FTP). Downloads ~700 MB of L2G predictions on first use, then caches them permanently. Subsequent queries for any number of variants finish in seconds.
The typical workflow for large sets: “`r hits <- select_top_hits(gwas_obj) hits <- find_nearest_gene(hits) hits <- annotate_with_l2g(hits, method = "bulk") “`