This page introduces the basic structure of a SPARQL query using a real example from PlantMetWiki.
Rather than focusing on abstract syntax, we explain how a concrete biological question is translated into a SPARQL query, and how to interpret each part of the query and its results.
By the end of this page, you should be comfortable:
- reading a SPARQL query used in PlantMetWiki,
- understanding what biological question it answers,
- recognizing how pathway content is represented in RDF,
- following links from PlantMetWiki to external resources such as PlantCyc.
SPARQL endpoint:
https://plantmetwiki.bioinformatics.nl/sparql
Graph used in all queries:
FROM <http://plantmetwiki.bioinformatics.nl/>
We will work with the α-solanine / α-chaconine biosynthesis pathway, a well-known plant specialised metabolic pathway involved in glycoalkaloid production in Solanum species (e.g. potato and tomato).
Anatomy of a SPARQL query
A SPARQL query consist out of several elements, which can be considered as building blocks.
Our PlantMetWiki question
Which PlantCyc reactions are part of the α-solanine / α-chaconine biosynthesis pathway, and how can we validate them in PlantCyc?
We will use this pathway URI throughout the tutorial:
<http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
SELECT — what do we want to see in the results?
The SELECT clause defines what will be returned as results.
For our question, we want: • the reaction identifier (?reactionId) • a clickable PlantCyc link (?plantCycReactionURL)
SELECT ?reactionId ?plantCycReactionURL
SELECT is used to indicate with variables from the SPARQL query you want to visualise as a result (in other words: which variables we find relevant as output to answer our biological question).
WHERE — how do we find that information?
The second element we encouter in a SPARQL query, is the query pattern, which starts with the word WHERE, with the query itself enclosed in curly brackets: {} .
The WHERE clause defines the graph pattern to match (triples in the form subject–predicate–object).
For PlantMetWiki pathways, we already discovered the key predicates: • gpml:hasInteraction (links a pathway to interactions) • some interactions represent real PlantCyc reactions (e.g. RXN-10730) • some interactions are GPML anchor helper nodes (contain anchor) and should not be linked to PlantCyc or interpreted as reactions
WHERE {
VALUES ?pathway { <...> }
?pathway gpml:hasInteraction ?interaction .
...
}
This is a set of RDF triples (subject–predicate–object), just like in the Wikidata tutorial, but with PlantMetWiki predicates.
Step-by-step interpretation of the query
Line 1 — VALUES (what are we querying about?)
VALUES lets us “pin” the query to one (or multiple) specific items.
VALUES ?pathway {
<http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
}
You can add more pathways inside the braces later (separated by spaces) if you want to compare multiple pathways.
Line 2 — Retrieve interactions from the pathway
This line uses the pathway as the subject and gets all linked interactions:
?pathway gpml:hasInteraction ?interaction .
Line 3 — Turn an interaction URI into a PlantCyc reaction link
PlantMetWiki does not use Wikidata’s label service. Instead, we often extract meaningful identifiers from URIs.
- Extract the part after /Interaction/:
BIND(
STRAFTER(STR(?interaction), "/Interaction/")
AS ?reactionId
)
- Keep only “real” reactions and exclude anchor helper nodes:
FILTER(CONTAINS(?reactionId, "RXN-"))
FILTER(!CONTAINS(?reactionId, "_anchor_"))
- Construct a clickable PlantCyc URL:
BIND(
IRI(CONCAT(
"https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=",
?reactionId
))
AS ?plantCycReactionURL
)
This turns the extracted identifier into a clickable external link.
Full query
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
SELECT ?reactionId ?plantCycReactionURL
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
VALUES ?pathway {
<http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
}
?pathway gpml:hasInteraction ?interaction .
BIND(STRAFTER(STR(?interaction), "/Interaction/") AS ?reactionId)
FILTER(CONTAINS(?reactionId, "RXN-"))
FILTER(!CONTAINS(?reactionId, "_anchor_"))
BIND(
IRI(CONCAT(
"https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=",
?reactionId
))
AS ?plantCycReactionURL
)
}
ORDER BY ?reactionId
LIMIT 200
Listing pathway components (genes, metabolites)
To see which data nodes (genes, metabolites) are present in the same pathway:
PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>
SELECT ?dataNodeId
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
VALUES ?pathway {
<http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
}
?pathway gpml:hasDataNode ?dataNode .
BIND(STRAFTER(STR(?dataNode), "/DataNode/") AS ?dataNodeId)
}
ORDER BY ?dataNodeId
LIMIT 200
A note on labels and identifiers
Unlike Wikidata, PlantMetWiki does not provide a dedicated label service (SERVICE wikibase:label).
Instead:
• some readable information is stored directly (e.g. gpml:name, gpml:textLabel),
• otherwise, meaningful identifiers are extracted directly from URIs using string functions such as STRAFTER().
This approach is used consistently throughout the tutorial.
Questions
Question 1: Which part of the query selects the pathway we want to investigate?
Answer:
VALUES ?pathway { <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/RC1000_r20251206224344> }
Question 2: Which line retrieves all interactions that belong to the pathway?
Answer:
?pathway gpml:hasInteraction ?interaction .
Question 3: Why do we filter out _anchor_ interactions?
Answer:
Interactions that contain _anchor_ are GPML helper nodes used for drawing/connecting edges. They are not real PlantCyc reaction identifiers, so PlantCyc will not recognize them.