Understanding SPARQL Queries

This page introduces the basic structure of a SPARQL query using a real example from PlantMetWiki.

Rather than focusing on abstract syntax, we explain how a concrete biological question is translated into a SPARQL query, and how to interpret each part of the query and its results.

By the end of this page, you should be comfortable:

reading a SPARQL query used in PlantMetWiki,
understanding what biological question it answers,
recognizing how pathway content is represented in RDF,
following links from PlantMetWiki to external resources such as PlantCyc.

SPARQL endpoint:
https://plantmetwiki.bioinformatics.nl/sparql

Graph used in all queries:
FROM <http://plantmetwiki.bioinformatics.nl/>

We will work with the α-solanine / α-chaconine biosynthesis pathway, a well-known plant specialised metabolic pathway involved in glycoalkaloid production in Solanum species (e.g. potato and tomato).

Anatomy of a SPARQL query

A SPARQL query consist out of several elements, which can be considered as building blocks.

Our PlantMetWiki question

Which PlantCyc reactions are part of the α-solanine / α-chaconine biosynthesis pathway, and how can we validate them in PlantCyc?

We will use this pathway URI throughout the tutorial:

<http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>

SELECT — what do we want to see in the results?

The SELECT clause defines what will be returned as results.

For our question, we want: • the reaction identifier (?reactionId) • a clickable PlantCyc link (?plantCycReactionURL)

SELECT ?reactionId ?plantCycReactionURL

SELECT is used to indicate with variables from the SPARQL query you want to visualise as a result (in other words: which variables we find relevant as output to answer our biological question).

WHERE — how do we find that information?

The second element we encouter in a SPARQL query, is the query pattern, which starts with the word WHERE, with the query itself enclosed in curly brackets: {} .

The WHERE clause defines the graph pattern to match (triples in the form subject–predicate–object).

For PlantMetWiki pathways, we already discovered the key predicates: • gpml:hasInteraction (links a pathway to interactions) • some interactions represent real PlantCyc reactions (e.g. RXN-10730) • some interactions are GPML anchor helper nodes (contain anchor) and should not be linked to PlantCyc or interpreted as reactions

WHERE {
  VALUES ?pathway { <...> }
  ?pathway gpml:hasInteraction ?interaction .
  ...
}

This is a set of RDF triples (subject–predicate–object), just like in the Wikidata tutorial, but with PlantMetWiki predicates.

Step-by-step interpretation of the query

Line 1 — VALUES (what are we querying about?)

VALUES lets us “pin” the query to one (or multiple) specific items.

VALUES ?pathway {
  <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
}

You can add more pathways inside the braces later (separated by spaces) if you want to compare multiple pathways.

Line 2 — Retrieve interactions from the pathway

This line uses the pathway as the subject and gets all linked interactions:

?pathway gpml:hasInteraction ?interaction .

Line 3 — Turn an interaction URI into a PlantCyc reaction link

PlantMetWiki does not use Wikidata’s label service. Instead, we often extract meaningful identifiers from URIs.

Extract the part after /Interaction/:

BIND(
  STRAFTER(STR(?interaction), "/Interaction/")
  AS ?reactionId
)

Keep only “real” reactions and exclude anchor helper nodes:

FILTER(CONTAINS(?reactionId, "RXN-"))
FILTER(!CONTAINS(?reactionId, "_anchor_"))

Construct a clickable PlantCyc URL:

BIND(
  IRI(CONCAT(
    "https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=",
    ?reactionId
  ))
  AS ?plantCycReactionURL
)

This turns the extracted identifier into a clickable external link.

Full query

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?reactionId ?plantCycReactionURL
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  VALUES ?pathway {
    <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
  }

  ?pathway gpml:hasInteraction ?interaction .

  BIND(STRAFTER(STR(?interaction), "/Interaction/") AS ?reactionId)

  FILTER(CONTAINS(?reactionId, "RXN-"))
  FILTER(!CONTAINS(?reactionId, "_anchor_"))

  BIND(
    IRI(CONCAT(
      "https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=",
      ?reactionId
    ))
    AS ?plantCycReactionURL
  )
}
ORDER BY ?reactionId
LIMIT 200

Listing pathway components (genes, metabolites)

To see which data nodes (genes, metabolites) are present in the same pathway:

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?dataNodeId
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  VALUES ?pathway {
    <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
  }

  ?pathway gpml:hasDataNode ?dataNode .
  BIND(STRAFTER(STR(?dataNode), "/DataNode/") AS ?dataNodeId)
}
ORDER BY ?dataNodeId
LIMIT 200

A note on labels and identifiers

Unlike Wikidata, PlantMetWiki does not provide a dedicated label service (SERVICE wikibase:label).

Instead:

•	some readable information is stored directly (e.g. gpml:name, gpml:textLabel),
•	otherwise, meaningful identifiers are extracted directly from URIs using string functions such as STRAFTER().

This approach is used consistently throughout the tutorial.

Questions

Question 1: Which part of the query selects the pathway we want to investigate?

Answer:
VALUES ?pathway { <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/RC1000_r20251206224344> }

Question 2: Which line retrieves all interactions that belong to the pathway?

Answer:
?pathway gpml:hasInteraction ?interaction .

Question 3: Why do we filter out _anchor_ interactions?

Answer:
Interactions that contain _anchor_ are GPML helper nodes used for drawing/connecting edges. They are not real PlantCyc reaction identifiers, so PlantCyc will not recognize them.