TETTRIs MS3
TETTRIs MS3 Taxonomic Resolution Engine is Available and Linked to Backbone Services
Date: 28/11/2024
- Funded by the European Union
- Document: Milestone MS3
- Version:
- Date: 28/11/2024
Vision of the TRE
The taxonomical classification of life on earth is one of the fundaments of biological research, be it understanding or documenting life in all its diversity, but also how all these organisms interact with each other, with their environment, and therefore also with human societies. Taxonomists have been building a massive corpus of binomial scientific names for centuries, following the formal introduction of the system by Linnaeus. These names have since been used throughout all the biological sciences to serve as a universal reference for taxa, i.e., a conceptual subset of organisms that are considered sufficiently similar to be grouped together.
The meaning of those names, i.e., their relation to organisms in the living world, is set down by taxonomic treatments connected to them, which describe exactly what traits need to be observed in order for an organism to be determined as a member of a certain taxon. These treatments are substantiated by material citations, which list (preserved) specimens of organisms on which the criteria set out in the treatment are based.
A particularly important group of these cited specimens are the types or original material. These specimens are cited when a new name is minted and are therefore the main link by which these names connect to organisms in real life. Type specimens have been preserved for centuries and are treasured pieces in natural history collections today. In the digital age, numerous taxonomic backbone services have been developed for different spatial scales, high-level taxon ranks and different core subjects, such as distinctions between logged observations, preserved specimens and ecological surveys. These databases do not always use the same names for overlapping taxa, or use different taxonomic interpretations of the same names, or use small spelling variants for the same name, or all of the above to varying extent.
This problem makes it very difficult to harmonise biological data at the taxonomic level, as the meaning of the names used in datasets may not be identical and there is neither a quick way to determine this, nor is there an easy way to align all observations to a single taxonomic backbone. One way of facilitating this process is by establishing unambiguous links from names to the type material they are defined with (Müller et al. 2024). This type material is scattered across the world in numerous different natural history collections, if even still in existence. Still, these collections have increasingly been digitised in recent decades, and these digital specimen records made available online. However, while for instance GBIF, the largest specimen data aggregator in the world, currently has more than 5 million records with an indication that these may be types, there is no direct validated link from these records to the names they may be types for.
One of the earliest mass digitisation projects for herbaria was the Global Plants Initiative, which among other factors focused on type and original material. The digitised images and their metadata were published to the JSTOR digital library, where they could be consulted by scientists from subscribing institutions. The data in JSTOR is not openly available, nor can it be queried and analysed in bulk. Most of the data in there have not been updated for years, up to a decade. While there is still some valuable added information in the JSTOR systems concerning typification (Page 2023), the inaccessibility and general outdatedness led us to abandon any attempt to leverage JSTOR data. A blog post by Rod Page from 2023 clarifies these problems in more detail (Page 2023).
Hence, such a type registry that connects names to the evidence of what they describe could lead to novel methods of aligning taxonomies, as they enable the usage of names in taxonomic backbones to connect to the material they were based on, possibly even leveraging trait and molecular data extracted from those specimens (Hardisty et al. 2022).
Type specimens and names have a long history and, although subject to the rules of International Code of Nomenclature for algae, fungi, and plants, their documentation, digitisation and publication have been done in a considerable decentralised manner. Persistently identifying either a name or a specimen is still not trivial. This jeopardises any attempt to connect a name to a specimen typifying it in a persistent manner. Efforts to connect them may not always be accurate or highlight nomenclatural rule violations, such as multiple holotypes for a single name.
For these reasons, we envision a key role of the Taxonomic Resolution Engine, as a custom Wikibase, to start modelling these links. A Wikibase is an openly curatable triple store that offers a from-the-shelf user-friendly interface, query engine and flexible data model designed for storing linked data. The technology has proven itself mainly through its implementation in Wikidata, but custom Wikibases have since become more prominent because they allow more flexibility in designing new data models and specialisation for a certain scope only.
Wikidata already contains millions of taxon items. These items generally don't distinguish between names and taxa. It also contains a little over a 1.000 type specimen items, but these do not distinguish between the specimen and its assignment as a type. For the TRE, we opted for a different model, distinguishing names, specimens and typification assertions for specimens to link them to a name. This way, assertions can be more easily curated and queried for their metadata, which would be more difficult if assertions were modelled solely as claims for specimen and name objects, as qualifiers and references of claims would need to be used and those are not as efficiently accessible through the Wikibase APIs.
Technical Setup
The TRE Wikibase was set up using the wikibase.cloud platform. This service is offered and maintained by Wikimedia Germany, a national chapter of the Wikimedia Foundation. By using this cloud platform, we avoid having to maintain and sustain hosting a Wikibase using the resources of the project and project partners. However, by relinquishing some of the technical oversight, we are dependent on the wikibase.cloud dev team for addressing any downtimes or other technical problems and limited in how much we can customize the Wikibase. More importantly, we can focus on how to use the Wikibase and deal less with technical maintenance of the server and hardware scaling, which can become extensive with a Wikibase.
Wikibase.cloud keeps backups of its hosted wikibases and the Special:Export service can be used to make XML dumps of the Wikibase's content, to move it to another Wikibase if needed. So data we import and curate in the TRE is not at risk of being lost if wikibase.cloud support would come to an end.
Initial Trials
During the Biohackathon Europe 2023, project 3 initiated the development to build a virtual reference collection for pollinators. One of the potential backends for this effort was the TRE Wikibase, and so scripts were developed to populate the TRE with taxonomic data for pollinators from the GBIF Taxonomic Backbone. The efforts in TETTRIs to mobilize virtual reference collections are still ongoing, and the TRE's eventual role in them is still not settled, but the process of populating the Wikibase with taxonomic names during the Biohackathon was later built on for importing the data to support the type specimen linkages.
Type specimen model
The model to use for typification links in the TRE was discussed in more detail in a workshop in Prague on April 29 2024, as part of the annual CETAF ISTC working group meeting. As described earlier, it was decided to model typification as triplets of Wikibase items, that is links between objects for taxon names, for the typifying specimens and for the typification assertions between them. Limited metadata were added to the different items, so as not to duplicate information too much, leading to synchronization complications. The most important properties are those that link the items to stable identifiers for what they represent, and to the sources from which they were acquired. For names, these are the links to records in nomenclatural databases such as IPNI for plants.
Table 1: Model for Taxon Names
PID | Field | Data Type | Description | Examples | |
---|---|---|---|---|---|
- | Label | string | equal to scientific name | Quercus robur L. | |
- | Description | string | zoological taxonomic name of rank <rank>" | "a botanical taxonomic name of rank species" "a zoological taxonomic name of rank family" | |
P1 | instance of | item | taxon name (Q1) | ||
P5 | taxon rank | item | any instance of taxon rank (Q7), see full list | species (Q16) family (Q14) form (Q19) | |
P14 | scientific name | string | usually but not always identical to the label. In case of species names, it can contain the author and the year, depending on the relevant nomenclatural code | Tetralonia lorenzicola Strand, 1910 | |
P15 | scientific name authorship | string | the author and year of the scientific name, usually not given for higher taxon ranks | Strand, 1910 | |
P16 | kingdom | string | the kingdom of the taxonomic name. | Plantae | |
P32 | IPNI plant name ID | external identifier | can be multiple ids for the same name | 77138633-1 | |
P7 | GBIF Species ID | external identifier | GBIF Species ID, which links to the GBIF Taxonomic Backbone. For imports from other sources, other identifiers could be used in the future. | 7620341 |
For specimens, the record contains the (internal) GBIF ID from which the information was taken. These IDs are not fully stable (see section 5 for more info), but have improved significantly since policies were enacted in 2022 to minimize the level of link breakage between the GBIF ID and the ID as used by the data provider to identify the specimen.
Acquiring these IDs minted by the data provider is less straightforward. Despite some efforts into this domain, there are still no universal methods to identify specimens (Agosti et al. 2022). The CETAF persistent identifier specification is still used by several CETAF member collections, but remains only a fraction of the specimens around the world. We attempted to derive a URI or URL from the specimen data, based on the usage of the http(s) protocol, as these should at least for some time be resolvable. These are linked using Property P35.
The Darwin Core occurrenceID also often contains a relatively unique and persistent way to identify the specimen, although most of these are not resolvable. We have used these as the label for the specimen. Specimens may have no occurrenceID, as not all pipelines to GBIF enforce this as a requirement. In this case, the external URI is used as the label. If that one is also not available, we re-use the GBIF ID (prefixed with "gbif:") as effectively the only identifier available for this specimen.
Table 2: Model for specimens
PID | Field | Data Type | Description | Examples |
---|---|---|---|---|
- | Label | string | the occurrenceID | urn:catalog:MO:Tropicos:811069 |
- | Description | string | a concatenation of a few key properties, including the name, the collector, the collection date, and the locality. If the locality makes the description too long, it is omitted. | a plant specimen known as Aspidium trichophorum Fée collected by F. L'Herminier in 1861 |
P1 | instance of | item | specimen (Q4) | |
P34 | gbifID | external identifier | ID for the occurrence in the GBIF infrastructure (an integer, NOT the occurrenceID) | 1261536081 |
P35 | External specimen ID | external identifier | A URI or URL that ideally identifies the specimen persistently or at least refers to more data about it | http://www.tropicos.org/Specimen/811069 |
P8 | source | item | An item for the dataset the specimen data was taken from, typically a GBIF download with a DOI | GBIF type records download (Q52455) |
P29 | taxon name | item | The taxon name as a link to an item that is an instance of Taxon Name (Q1) | Anthurium oneillii Croat (Q49605) |
P26 | collectionID | string | The code of the collection (or institution), commonly used to link specimens to their names | MO |
Table 3: Model for typification assertions
PID | Field | Data Type | Description | Examples |
---|---|---|---|---|
- | Label | string | [occurrenceID]: [typestatus] of [scientificName] | urn:catalog:MO:Tropicos:101539113: holotype of Aspidosperma crypticum J.F.Morales & N.Zamora |
- | Description | string | link between a specimen and the name it is a type for | |
P1 | instance of | item | typification assertion (Q47338) | |
P30 | type status | item | Item that is an instance of type status (Q47296) | Lectotype |
P36 | specimen | item | Specimen that is asserted to be a certain type | UPS:BOT:V-175663 |
P29 | taxon name | item | Name it is that type for | Quercus robur L. |
P8 | source | item | Sources used for both the names and the specimens | International Plant Names Index (IPNI), version 2024-10-06 (Q62776) |
The data model can still be extended in the future if additional features are desired (cf future plans at the end of this document).
Type specimen import
Imports into the TRE can easily be performed using the wikibaseintegrator (wbi) python package. An OAuth 1.0a consumer is set up to authenticate for the Wikimedia API, which can then be accessed programmatically in Python by the wbi package. This enables easy batch additions, edits and deletions of the data in the Wikibase at a throttled rate of ca. 10.000 items added or edited per hour, so as not to overwhelm the Wikimedia REST API and its backbone services processing the requests.
For novel import, batch scripts need to run in series. First, the names data are imported and then along with their item IDs retrieved through a SPARQL query. The queries can be found below and run using the query service. These names are then joined into the specimen dataset, which is then imported and its item IDs also retrieved through SPARQL. Finally, both name and specimen item IDs are matched into the typification assertion data and these are imported. This three-step approach ensures links are only made to items that are already created.
The current version of the Python scripts can be found here. They are based on earlier work which originated from the Biohackathon Europe 2023 project (see section 3).
Query 1: SPARQL query to find (instances of) typification assertions (Q47338).
The following query uses these:
- Properties: instance of (P1)
PREFIX tre: <https://tre-test.wikibase.cloud/entity/> PREFIX tred: <https://tre-test.wikibase.cloud/prop/direct/> SELECT ?typification ?typificationLabel WHERE { ?typification tred:P1 tre:Q47338. # P1 = instance of; Q47338 = typification assertion SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Query 2: SPARQL query to find (instances of) specimens (Q4).
The following query uses these:
- Items: specimen (Q4)
- Properties: instance of (P1)
PREFIX tre: <https://tre-test.wikibase.cloud/entity/> PREFIX tred: <https://tre-test.wikibase.cloud/prop/direct/> SELECT ?typeSpecimen ?typeSpecimenLabel WHERE { ?typeSpecimen tred:P1 tre:Q4. # P1 = instance of; Q4 = specimen SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Query 3: SPARQL query to find names. The IPNI plant name ID (P32) is used in this case to limit the number of unwanted results, as still a lot of other taxon names are available from previous trials.
The following query uses these:
- Properties: IPNI plant name ID (P32)
PREFIX tre: <https://tre-test.wikibase.cloud/entity/> PREFIX tred: <https://tre-test.wikibase.cloud/prop/direct/> SELECT DISTINCT ?item ?itemLabel WHERE { ?item tred:P32 ?value. # P32 = IPNI plant name ID SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Downloads from GBIF were requested at multiple dates using the predicate API, to acquire more up to date data and investigate turnover of type specimens, or specimens in general over time. Some statistics for these downloads can be seen in table 4, with turnover of GBIF IDs listed from one snapshot to the next. Records may be omitted from future snapshots either because they are deleted or because they are updated to no longer be annotated as a type. Records that are deleted may be reuploaded under another GBIF ID. This could be checked by comparing dwc:occurrenceID turnover, but these may also change and are not unique across datasets.
To estimate the fractions of deleted versus no longer typed specimens, predicate requests were also made to investigate the status of GBIF IDs no longer included in newer downloads. These can be made with a maximum of 101.000 IDs per request, and GBIF supports a maximum of three concurrent downloads being generated by their systems. Based on these numbers, about 64% of those records that are not represented anymore in the type downloads are still present in GBIF (and hence presumably just no longer labeled as types through dwc:typeStatus). As seen in table 4, most of those occurred in the time between the first and second download, which was the longest period (ca. 7 months instead of 3). There have been policy changes at GBIF in the last few years, which aim to limit the number of GBIF IDs getting tombstoned. While these changes have been implemented since 2022, a blog post in November 2023 may have renewed attention to the issue or highlighted increased efforts or successes in their approaches to identifier matching with data providers.
Query 4: JSON payload for a GBIF predicate query to download all records with a value for typeStatus
{
"creator": "[gbif username]",
"notificationAddresses": [
"[your_email_address]"
],
"sendNotification": true,
"format": "DWCA",
"predicate": {
"type": "isNotNull",
"parameter": "TYPE_STATUS"
}
}
Query 5: JSON payload for a GBIF predicate query to download a list of records with a certain GBIF ID (tombstoned records will not be included). A limit of 100.000 IDs in the value array is used (technically up to 101.000).
{
"creator": "[gbif username]",
"notificationAddresses": [
"[your_email_address]"
],
"sendNotification": true,
"format": "DWCA",
"predicate": {
"type": "in",
"key": "GBIF_ID",
"values": [
"4173969305",
"3750353301",
[...]
}
}
Table 4: Summary statistics of the type specimen downloads. Note that the first download from 2023 also excludes records with basisOfRecord equal to "Observation", "Machine_Observation" or "Human_Observation".
Download DOI | Date | Record Count | Retained | New | Missing | Tombstoned GBIF IDs |
---|---|---|---|---|---|---|
doi:10.15468/dl.m69qux | 2023-08-29 | 4,693,711 | 4,693,711 | 214,176 | 85,417 | |
doi:10.15468/dl.z785vt | 2024-04-11 | 4,781,603 | 4,479,535 | 295,568 | 109,538 | 14,239 |
doi:10.15468/dl.32nt3p | 2024-07-10 | 4,941,521 | 4,665,565 | 275,956 | 12,225 | 4,322 |
doi:10.15468/dl.3x3rq2 | 2024-10-13 | 4,987,244 | 4,929,296 | 57,948 |
Matching scripts
Scripts to match names in specimen data to names in datasets that include some metadata concerning typification. For data processing, R scripts are used, which validate and harmonise the source data and then attempt to link them up following multiple conditions. The current basic scripts look for exact matches on taxon name string, collection/institution code and typestatus, but these will be extended to take into account literature and name variations. Specimen data on GBIF may have multiple names listed:
- Scientific name as matched to the GBIF backbone. - Typified name as interpreted from dwc:typeStatus - Scientific name as provided by the data provider (verbatimScientificName) - Scientific name as listed in the Identification History extension
Ideally, typified names should be provided using the not yet ratified dwc:typifiedName term, to avoid all ambiguity as to which of the names the specimen is supposedly a type for. For this reason, the ratification of this term into Darwin Core was highlighted as a key development and it is now scheduled to be taken up in the next public review of outstanding Darwin Core issues. See also https://github.com/tdwg/dwc/issues/28. External identifiers for the specimens, to connect them to the source, such as following the CETAF specimen identifier specification, were derived from the occurrenceID, references or bibliographicCitation terms, if possible, based on whether they were formatted as URLs (using the http(s) protocol). They were then classified based on the domain name used for those URLs. For the IPNI matching, most specimens could be identified by Tropicos URLs (almost a third), whereas a quarter of the specimens had no associated determinable URI or URL. Most of the other domains also represented aggregator portals which publish data for multiple herbaria, although some CETAF members were also represented, like Meise Botanic Garden, the Natural History Museum Vienna and the National Museum of Natural History Paris. Some issues were also detected with these identifiers, such as http://herbariovirtualreflora.jbrj.gov.br/ not resolving (the subdomain is now https://reflora.jbrj.gov.br) and intermountainbiota.org represented both with and without the www subdomain.
We opted to include these external identifiers, if any, as well as a link to the GBIF download DOI and to the GBIF record ID (at the time), to try and introduce as much stability in the links as possible, but it's clear that the current lack of a universal and persistent specimen identification protocol still weakens any linking method.
Table 5: Twenty most commonly used domains for external identifiers of type specimens matched for botanical names. There were 53 different domains in total and 594 different collection codes.
Domain | Count (n) | Percentage (%) |
---|---|---|
www.tropicos.org | 25,637 | 31.13 |
(No domain specified) | 21,164 | 25.70 |
herbariovirtualreflora.jbrj.gov.br | 7,014 | 8.52 |
n2t.net | 6,473 | 7.86 |
sweetgum.nybg.org | 6,344 | 7.70 |
data.huh.harvard.edu | 3,526 | 4.28 |
coldb.mnhn.fr | 3,062 | 3.72 |
specimens.kew.org | 2,894 | 3.51 |
specieslink.net | 1,751 | 2.13 |
www.botanicalcollections.be | 734 | 0.89 |
w.jacq.org | 554 | 0.67 |
cch2.org | 526 | 0.64 |
www.intermountainbiota.org | 340 | 0.41 |
sernecportal.org | 279 | 0.34 |
wu.jacq.org | 235 | 0.29 |
scd.landcareresearch.co.nz | 220 | 0.27 |
swbiodiversity.org | 194 | 0.24 |
id.snsb.info | 151 | 0.18 |
intermountainbiota.org | 127 | 0.15 |
specifyportal.flmnh.ufl.edu | 110 | 0.13 |
How to use the TRE
The TRE can be queried using the regular Wikibase search functionality, but its main power lies in accessing its linked data through the SPARQL query service. As the specimens, names and their typification links are all modeled as separate entities, they can easily be retrieved using various filters, only limited by the currently supported elements of the data model and any lingering interoperability issues with the data, such as unclear specimen identifiers.
Examples of access using the SPARQL endpoint (https://tre-test.wikibase.cloud/query) can be seen in the following queries. We will publish this milestone as a living document as well on the TETTRIs WP2 wiki, where we will continue to update it with additional query options.
Query 6: Retrieve assertions for a certain name (with id Q49445)
The following query uses these:
- Properties: instance of (P1), taxon name (P29)
PREFIX tre: <https://tre-test.wikibase.cloud/entity/> PREFIX tred: <https://tre-test.wikibase.cloud/prop/direct/> SELECT ?typification ?typificationLabel WHERE { ?typification tred:P1 tre:Q47338; # P1 = instance of; Q47338 = typification assertion tred:P29 tre:Q49445. # P29 = taxon name SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Query 7: Also include the GBIF ids for the specimens in those assertions
The following query uses these:
- Properties: instance of (P1), taxon name (P29), specimen (P36), gbifID (P34)
PREFIX tre: <https://tre-test.wikibase.cloud/entity/> PREFIX tred: <https://tre-test.wikibase.cloud/prop/direct/> SELECT ?typification ?typificationLabel ?linkedItem ?linkedItemLabel ?gbifID WHERE { ?typification tred:P1 tre:Q47338; # P1 = instance of; Q47338 = typification assertion tred:P29 tre:Q49445. # P29 = taxon name OPTIONAL { ?typification tred:P36 ?linkedItem. # P36 = specimen ?linkedItem tred:P34 ?gbifID. # P34 = GBIF Occurrence ID } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Query 8: Find the item for a taxon name with a specific IPNI name id (P32).
The following query uses these:
- Properties: IPNI plant name ID (P32)
PREFIX tre: <https://tre-test.wikibase.cloud/entity/> PREFIX tred: <https://tre-test.wikibase.cloud/prop/direct/> SELECT ?item ?itemLabel WHERE { ?item tred:P32 "77200924-1". SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
Sustainability of the TRE
The sustainability of the TRE cannot easily be guaranteed beyond the TETTRIs project's funding period. However, by choosing a free platform populated and curated with open-source scripts, maintenance is not up to a single party. Still, there are some potential avenues for continued support beyond the TETTRIs project should the Wikibase somehow lose support:
- Migrate end results into Wikidata: We have adopted a model that is as similar as possible to the Wikidata model. As such, migrating the triples made in the TRE to Wikidata should be possible and mainly obstructed by finding support for the typification model to be accepted by the Wikidata community.
- CETAF service: Similar to other TETTRIs products, the TRE could be maintained by CETAF as a service to the taxonomic community. CETAF can continue to use the platform on wikibase.cloud or migrate the data to a Wikibase of their own hosting.
- GBIF type registry: With the planned migration away from the GBIF Taxonomic Backbone to support the CoL Extended Release as GBIF's backbone, links to types could be published to ChecklistBank and be incorporated from there into the Catalogue of Life.
- Worst case scenario, if no replacement can be found: The dumps or alternative exports are archived in a public repository, such as Zenodo, so that they might be useful to some other project in the future.
Future plans
Include image links for the specimens. Many specimens have images and these constitute perhaps the most stable representation of them in the absence of reliable specimen PIDs. However, the URLs to the servers where these images currently reside may (and do) also break. A possible alternative could be to archive the image by uploading it into Zenodo, if the license allows it. This way, they will remain accessible even if all other links break, which is today still not too much of an unlikely scenario.
Incorporate links to the literature. GBIF now contains many Material Citations of specimens, some of which are part of typifications. These could be utilised, in particular in conjunction with GBIF's data clustering service, to connect names to specimens and to the literature.
References
- Agosti D, Benichou L, Addink W, Arvanitidis C, Catapano T, Cochrane G, Dillen M, Döring M, Georgiev T, Gérard I, Groom Q, Kishor P, Kroh A, Kvaček J, Mergen P, Mietchen D, Pauperio J, Sautter G, Penev L (2022) Recommendations for use of annotations and persistent identifiers in taxonomy and biodiversity publishing. Research Ideas and Outcomes 8: e97374. https://doi.org/10.3897/rio.8.e97374
- GBIF.org (29 August 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.m69qux
- GBIF.org (11 April 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.z785vt
- GBIF.org (10 July 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.32nt3p
- GBIF.org (13 October 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.3x3rq2
- Alex R Hardisty, Elizabeth R Ellwood, Gil Nelson, Breda Zimkus, Jutta Buschbom, Wouter Addink, Richard K Rabeler, John Bates, Andrew Bentley, José A B Fortes, Sara Hansen, James A Macklin, Austin R Mast, Joseph T Miller, Anna K Monfils, Deborah L Paul, Elycia Wallis, Michael Webster, Digital Extended Specimens: Enabling an Extensible Network of Biodiversity Data Records as Integrated Digital Objects on the Internet, BioScience, Volume 72, Issue 10, October 2022, Pages 978–987, https://doi.org/10.1093/biosci/biac060
- Müller A, von Raab-Straube E, Berendsohn WG (2024) A Taxonomic Concept Mapping Service for Taxonomic Information Aggregators. Biodiversity Information Science and Standards 8: e136016. https://doi.org/10.3897/biss.8.136016
- Page, R. (2023). Where are the plant type specimens? Mapping JSTOR Global Plants to GBIF. https://doi.org/10.59350/m59qn-22v52