Federated Queries Across Linked Open Data

Tutorial on SPARQL for PlantMetWiki


One of the main strengths of SPARQL is that it allows federated queries: a single query can combine data from multiple, independent knowledge bases.

PlantMetWiki is designed to work together with existing Linked Open Data resources such as:

  • Wikidata
  • ChEBI
  • PubMed

In this section, we show how to move beyond PlantMetWiki alone and place plant metabolic pathways in a broader biological knowledge graph.

SPARQL endpoint https://plantmetwiki.bioinformatics.nl/sparql

Graph used in all queries

FROM <http://plantmetwiki.bioinformatics.nl/>

What is a federated SPARQL query?

A federated query uses the SERVICE keyword to send part of the query to a remote SPARQL endpoint.

Conceptually: • PlantMetWiki provides pathway context • External endpoints provide chemical, biological, or literature metadata • SPARQL stitches them together

SERVICE <https://query.wikidata.org/sparql> {
  ...
}

Each SERVICE block is evaluated remotely, and the results are merged with the local query.

Why federate from PlantMetWiki?

PlantMetWiki focuses on:

•	pathways
•	species
•	biosynthesis
•	gene clusters

It deliberately does not duplicate:

•	chemical ontologies
•	literature databases
•	encyclopedic metadata

Federation lets you:

•	enrich pathways with chemical identifiers
•	connect metabolites to publications
•	reuse authoritative external resources

Example 1 — Sending metabolites to Wikidata

Many PlantMetWiki pathways contain metabolites with identifiers that are also known to Wikidata.

Using a federated query, we can:

1.	extract metabolite identifiers from PlantMetWiki
2.	send them to Wikidata
3.	retrieve additional metadata

Example (from WikidataTest.rq):

PREFIX gpml: <http://vocabularies.wikipathways.org/gpml#>

SELECT ?metabolite ?wikidataItem
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?pathway gpml:hasDataNode ?metabolite .

  SERVICE <https://query.wikidata.org/sparql> {
    ?wikidataItem ?p ?metabolite .
  }
}
LIMIT 100

This demonstrates the mechanism of federation, even before refining identifiers.

Example 2 — Linking metabolites via InChIKeys

Chemical identifiers such as InChIKeys provide a robust bridge between databases.

PlantMetWiki → InChIKey → Wikidata → ChEBI

Example (from WikidataInChiKeys.rq):

SELECT ?metabolite ?inchiKey ?wikidataItem
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?metabolite ?p ?inchiKey .
  FILTER(CONTAINS(STR(?p), "InChIKey"))

  SERVICE <https://query.wikidata.org/sparql> {
    ?wikidataItem wdt:P235 ?inchiKey .
  }
}
LIMIT 100

This pattern allows you to:

•	unify chemical identities across resources
•	avoid ambiguous names
•	build reliable cross-database links

Example 3 — Federating to ChEBI

ChEBI is the authoritative ontology for chemical entities of biological interest.

Using InChIKeys or ChEBI IDs, you can retrieve:

•	chemical classifications
•	roles (e.g. alkaloid, glycoside)
•	ontology relationships

Example (from FederatedMetabolitesChEBI.rq):

SELECT ?metabolite ?chebi
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?metabolite ?p ?chebi .
  FILTER(CONTAINS(STR(?chebi), "CHEBI"))

  SERVICE <https://query.wikidata.org/sparql> {
    ?chebiItem wdt:P683 ?chebi .
  }
}
LIMIT 100

This enables ontology-aware pathway analysis without duplicating ChEBI locally.

Example 4 — Linking pathways to publications (PubMed)

Many pathways and gene clusters are supported by literature evidence.

Using federated queries, you can:

•	extract PubMed IDs
•	query Wikidata for article metadata
•	retrieve titles, journals, and authors

Example (from ListPubMedIDs.rq):

SELECT DISTINCT ?pmid
FROM <http://plantmetwiki.bioinformatics.nl/>
WHERE {
  ?pathway ?p ?pmid .
  FILTER(CONTAINS(STR(?pmid), "pubmed"))
}

Extended with federation (from WikidataLookupByInChIKeys.rq):

SERVICE <https://query.wikidata.org/sparql> {
  ?article wdt:P698 ?pmid ;
           rdfs:label ?title .
  FILTER(LANG(?title) = "en")
}

This connects:

•	pathway → metabolite → publication
•	enabling traceable biological evidence

Example 5 — Bidirectional federation

Federation does not have to start from PlantMetWiki.

You can:

•	query Wikidata first
•	then match results against PlantMetWiki

Example (from SendInChiKeysToWikidata.rq):

SERVICE <https://query.wikidata.org/sparql> {
  ?item wdt:P235 ?inchiKey .
}

?metabolite ?p ?inchiKey .

This pattern is useful when:

•	starting from literature or chemical knowledge
•	and asking whether PlantMetWiki contains related pathways

Practical considerations

Performance

•	Federated queries are slower than local queries
•	Limit result sizes (LIMIT)
•	Avoid unnecessary variables

Stability

•	External endpoints may change
•	Wikidata enforces rate limits
•	Queries should be robust to partial results

Design philosophy

PlantMetWiki intentionally stays lightweight:

•	no chemical ontology duplication
•	no literature mirroring
•	no monolithic data model

Federation keeps the ecosystem modular and sustainable.

What you can do with federated queries

By combining PlantMetWiki with external resources, you can:

•	trace metabolites from genome → pathway → chemistry → literature
•	enrich pathway analyses with ontology information
•	integrate PlantMetWiki into larger knowledge graphs
•	support FAIR, reusable, interoperable workflows

Summary

Federated SPARQL queries allow PlantMetWiki to function as:

•	a hub for plant metabolic pathways
•	a connector between genomics, chemistry, and literature
•	a first-class citizen of the Linked Open Data ecosystem

This closes the loop from: genes → pathways → metabolites → publications → knowledge

Tutorial section Query file
Wikidata basics WikidataTest.rq
InChIKey federation WikidataInChiKeys.rq
ChEBI federation FederatedMetabolitesChEBI.rq
PubMed links ListPubMedIDs.rq
Reverse federation SendInChiKeysToWikidata.rq
Advanced lookups WikidataLookupByInChIKeys.rq