Skip to contents

Queries the Open Targets Platform to retrieve L2G scores and adds `l2g_gene_id`, `l2g_gene_name`, and `l2g_score` columns.

Usage

annotate_with_l2g(x, id_col = "ID", ...)

Arguments

x

A data.frame or tibble of GWAS hits.

id_col

Name of the column containing variant IDs. Default `"ID"`.

...

Additional arguments (unused).

method

`"api"` (per-variant GraphQL, default) or `"bulk"` (parquet).

release

Open Targets Platform release string, e.g. `"25.09"`. `NULL` (default) auto-detects the latest release. Only used when `method = "bulk"`.

cache_dir

Directory for cached parquet files. Defaults to the platform user-data directory for gwasplot. Only used when `method = "bulk"`.

ask

If `TRUE` (default when running interactively), prompt the user before downloading data. Set `FALSE` to download without prompting (e.g. in scripts). Only used when `method = "bulk"` and data is not cached.

Value

The input data.frame with three new columns appended:

l2g_gene_id

Ensembl gene ID of the top L2G gene

l2g_gene_name

Approved gene symbol

l2g_score

L2G score 0–1 (higher = more likely causal)

Variants with no L2G predictions receive `NA` in these columns.

Details

Two methods are available: * `"api"` (default) — queries the GraphQL API once per variant. Fast for small sets (<100 variants), but slow for larger ones (~30 min for 1000 variants). * `"bulk"` — uses DuckDB to query the Open Targets Platform parquet files (FTP). Downloads ~700 MB of L2G predictions on first use, then caches them permanently. Subsequent queries for any number of variants finish in seconds.

The typical workflow for large sets: “`r hits <- select_top_hits(gwas_obj) hits <- find_nearest_gene(hits) hits <- annotate_with_l2g(hits, method = "bulk") “`