Exploring Species and Pathways

Tutorial on SPARQL for PlantMetWiki


In the previous page, we focused on a single plant metabolic pathway and examined how reactions are represented and linked to PlantCyc.

In this section, we take a step back and explore PlantMetWiki as a collection:

  • Which plant species are represented?
  • Which pathways exist per species?
  • How can we navigate across species and pathways using SPARQL?

This page introduces exploratory queries that help you understand the scope of the database before asking more detailed biological questions.

SPARQL endpoint
https://plantmetwiki.bioinformatics.nl/sparql

Graph used in all queries

FROM <http://plantmetwiki.bioinformatics.nl/>

How species are represented in PlantMetWiki

Unlike Wikidata, PlantMetWiki stores species names directly as text literals, rather than numeric identifiers.

This makes it easy to: • read queries • copy species names into VALUES blocks • explore the database interactively

Species information is attached to pathways using the predicate: gpml:organism

So far, we have implicitly focused on a single species by querying a single pathway. If we want to explore pathways from multiple species, we can do this by changing the VALUES line in our query.

{ 
	VALUES ?organism { "Solanum tuberosum" }
}

This restricts the query to pathways annotated for potato.

Discovering which species are available

Before querying pathways for a specific plant, it is useful to know which species are present at all.

The following query lists all species annotated in PlantMetWiki:

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT DISTINCT ?organism
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?pathway gpml:organism ?organism .
}
ORDER BY ?organism

This gives you a controlled vocabulary of species names that can be reused directly in other queries.

Listing pathways for a given species

Once you know which species exist, you can retrieve the pathways associated with a specific plant.

For example, to list pathways annotated for Solanum tuberosum (potato):

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?pathway
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?pathway gpml:organism "Solanum tuberosum" .
}
LIMIT 200

At this stage, the query returns pathway identifiers (URIs).

Making results more informative: pathway names

To make the output easier to interpret, we can include pathway names when they are available.

We extend the SELECT clause and add an OPTIONAL pattern:

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?pathway ?pathwayName
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?pathway gpml:organism "Solanum tuberosum" .
  OPTIONAL { ?pathway gpml:name ?pathwayName }
}
LIMIT 200

Using OPTIONAL ensures that pathways without a name are still returned.

Comparing pathways across multiple species

SPARQL allows you to compare species by listing them explicitly using VALUES.

For example, to retrieve pathways for potato and Arabidopsis:

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?organism ?pathway ?pathwayName
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  VALUES ?organism {
    "Solanum tuberosum"
    "Arabidopsis thaliana"
  }

  ?pathway gpml:organism ?organism .
  OPTIONAL { ?pathway gpml:name ?pathwayName }
}
LIMIT 200

This query makes the species explicit in the results, which is especially useful when comparing model plants with crop species.

Questions

How would the VALUES line look if we also want to include Oryza sativa?

Answer:
VALUES ?organism {
  "Solanum tuberosum"
  "Arabidopsis thaliana"
  "Oryza sativa"
}

Which species?

Since we are now retrieving pathways from multiple species, it is useful to explicitly show the species in the results. To do this, we modify the SELECT clause so that the organism is visible:

SELECT ?organism ?pathway

If we also want to include the pathway name (when available), we can extend this further:

SELECT ?organism ?pathway ?pathwayName

And add the corresponding triple pattern:

OPTIONAL { ?pathway gpml:name ?pathwayName }

Updated query with pathway names

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?organism ?pathway ?pathwayName
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  VALUES ?organism {
    "Solanum tuberosum"
    "Arabidopsis thaliana"
  }

  ?pathway gpml:organism ?organism .
  OPTIONAL { ?pathway gpml:name ?pathwayName }
}
LIMIT 200

Questions

Which variable adds the species name to the results?

Answer:
?organism, filled via ?pathway gpml:organism ?organism

Easier querying: discovering species in PlantMetWiki

Unlike Wikidata, PlantMetWiki does not require numeric identifiers (such as Q-numbers). Species names are stored directly as literals.

If you are not sure which species are present in the database, you can list them:

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT DISTINCT ?organism
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?pathway gpml:organism ?organism .
}
ORDER BY ?organism

This query gives you a controlled vocabulary of species that you can copy directly into a VALUES block.

Small expansion: count pathways per species

We can also aggregate results to answer questions such as:

Which species have the most pathways in PlantMetWiki?

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?organism (COUNT(DISTINCT ?pathway) AS ?nPathways)
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?pathway gpml:organism ?organism .
}
GROUP BY ?organism
ORDER BY DESC(?nPathways)

Notes on visualization

Unlike Wikidata, the PlantMetWiki SPARQL endpoint does not provide built-in image visualizations.

However, you can:

•	export results as tabless
•	click through to PlantCyc reaction links (as shown in Assignment 1)
•	use external tools (e.g. notebooks, R, Python) to visualize pathway statistics