Understanding SPARQL Queries

Tutorial on SPARQL for PlantMetWiki


This page introduces the basic structure of a SPARQL query using a real example from PlantMetWiki.

Rather than focusing on abstract syntax, we explain how a concrete biological question is translated into a SPARQL query, and how to interpret each part of the query and its results.

By the end of this page, you should be comfortable:

  • reading a SPARQL query used in PlantMetWiki,
  • understanding what biological question it answers,
  • recognizing how pathway content is represented in RDF,
  • following links from PlantMetWiki to external resources such as PlantCyc.

SPARQL endpoint:
https://plantmetwiki.bioinformatics.nl/sparql

Graph used in all queries:
FROM <http://plantmetwiki.bioinformatics.nl/>

We will work with the α-solanine / α-chaconine biosynthesis pathway, a well-known plant specialised metabolic pathway involved in glycoalkaloid production in Solanum species (e.g. potato and tomato).

Anatomy of a SPARQL query

A SPARQL query consist out of several elements, which can be considered as building blocks.

Our PlantMetWiki question

Which PlantCyc reactions are part of the α-solanine / α-chaconine biosynthesis pathway, and how can we validate them in PlantCyc?

We will use this pathway URI throughout the tutorial:

<http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>

SELECT — what do we want to see in the results?

The SELECT clause defines what will be returned as results.

For our question, we want: • the reaction identifier (?reactionId) • a clickable PlantCyc link (?plantCycReactionURL)

SELECT ?reactionId ?plantCycReactionURL

SELECT is used to indicate with variables from the SPARQL query you want to visualise as a result (in other words: which variables we find relevant as output to answer our biological question).

WHERE — how do we find that information?

The second element we encouter in a SPARQL query, is the query pattern, which starts with the word WHERE, with the query itself enclosed in curly brackets: {} .

The WHERE clause defines the graph pattern to match (triples in the form subject–predicate–object).

For PlantMetWiki pathways, we already discovered the key predicates: • gpml:hasInteraction (links a pathway to interactions) • some interactions represent real PlantCyc reactions (e.g. RXN-10730) • some interactions are GPML anchor helper nodes (contain anchor) and should not be linked to PlantCyc or interpreted as reactions

WHERE {
  VALUES ?pathway { <...> }
  ?pathway gpml:hasInteraction ?interaction .
  ...
}

This is a set of RDF triples (subject–predicate–object), just like in the Wikidata tutorial, but with PlantMetWiki predicates.

Step-by-step interpretation of the query

Line 1 — VALUES (what are we querying about?)

VALUES lets us “pin” the query to one (or multiple) specific items.

VALUES ?pathway {
  <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
}

You can add more pathways inside the braces later (separated by spaces) if you want to compare multiple pathways.

Line 2 — Retrieve interactions from the pathway

This line uses the pathway as the subject and gets all linked interactions:

?pathway gpml:hasInteraction ?interaction .

PlantMetWiki does not use Wikidata’s label service. Instead, we often extract meaningful identifiers from URIs.

  1. Extract the part after /Interaction/:
BIND(
  STRAFTER(STR(?interaction), "/Interaction/")
  AS ?reactionId
)
  1. Keep only “real” reactions and exclude anchor helper nodes:
FILTER(CONTAINS(?reactionId, "RXN-"))
FILTER(!CONTAINS(?reactionId, "_anchor_"))
  1. Construct a clickable PlantCyc URL:
BIND(
  IRI(CONCAT(
    "https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=",
    ?reactionId
  ))
  AS ?plantCycReactionURL
)

This turns the extracted identifier into a clickable external link.

Full query

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?reactionId ?plantCycReactionURL
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  VALUES ?pathway {
    <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
  }

  ?pathway gpml:hasInteraction ?interaction .

  BIND(STRAFTER(STR(?interaction), "/Interaction/") AS ?reactionId)

  FILTER(CONTAINS(?reactionId, "RXN-"))
  FILTER(!CONTAINS(?reactionId, "_anchor_"))

  BIND(
    IRI(CONCAT(
      "https://pmn.plantcyc.org/PLANT/NEW-IMAGE?type=REACTION&object=",
      ?reactionId
    ))
    AS ?plantCycReactionURL
  )
}
ORDER BY ?reactionId
LIMIT 200

Listing pathway components (genes, metabolites)

To see which data nodes (genes, metabolites) are present in the same pathway:

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?dataNodeId
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  VALUES ?pathway {
    <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/PC346_r20251206224344>
  }

  ?pathway gpml:hasDataNode ?dataNode .
  BIND(STRAFTER(STR(?dataNode), "/DataNode/") AS ?dataNodeId)
}
ORDER BY ?dataNodeId
LIMIT 200

A note on labels and identifiers

Unlike Wikidata, PlantMetWiki does not provide a dedicated label service (SERVICE wikibase:label).

Instead:

•	some readable information is stored directly (e.g. gpml:name, gpml:textLabel),
•	otherwise, meaningful identifiers are extracted directly from URIs using string functions such as STRAFTER().

This approach is used consistently throughout the tutorial.

Questions

Question 1: Which part of the query selects the pathway we want to investigate?

Answer:
VALUES ?pathway { <http://rdf-plantmetwiki.bioinformatics.nl/Pathway/RC1000_r20251206224344> }

Question 2: Which line retrieves all interactions that belong to the pathway?

Answer:
?pathway gpml:hasInteraction ?interaction .

Question 3: Why do we filter out _anchor_ interactions?

Answer:
Interactions that contain _anchor_ are GPML helper nodes used for drawing/connecting edges. They are not real PlantCyc reaction identifiers, so PlantCyc will not recognize them.