Case study: Linking across identifiers

There are three important uses for linking persistent identifiers:

These links are important for navigating the research landscape, but also allow us to look at the far reaching impact of each individual contributor, paper, dataset and grant.

Here we will give examples of how data repositories have linked identifiers to give us a much better view of research networks and dependencies.

Versions and related works

Many repositories using DataCite to assign DOIs to their content use their metadata to make links between resources. The DataCite metadata schema contains the field RelatedIdentifier which makes this possible. This field explains the relationship and gives the identifier locating the related item.

Versions

The UK's Archaeology Data Service uses the RelatedIdentifier to show the relationship between different versions of a dataset. More information on versioning data can be found in Examples of versioning with identifiers.

<relatedIdentifiers>
	<relatedIdentifier relatedIdentifierType="DOI" relationType="IsNewVersionOf">10.5284/1000417</relatedIdentifier>
	<relatedIdentifier relatedIdentifierType="DOI" relationType="IsNewVersionOf">10.5284/1000341</relatedIdentifier>
</relatedIdentifiers>

Related works

This dataset from Durham University links to the paper that has cited the data, as well as the software that was used during creation of the data.

<relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType='URL' relationType='IsCitedBy'>http://dx.doi.org/10.1016/j.ssnmr.2015.05.003</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType='URL' relationType='Cites'>https://www.dur.ac.uk/solids.nmr/software/pnmrsim/</relatedIdentifier>
</relatedIdentifiers>

In order to make this a bi-directional link, the persistent identifier, expressed as a URI, should also be used by papers citing data and code.

Outside of DOIs, the EBI provides a number of life science data resources. Many of these link to related data across the EBI services, as shown in the image below.

590

Cross-linking of major life science resources at EMBL-EBI (source: EMBL-EBI)

Alternate identifiers

The DataCite metadata schema also allows for secondary identifiers that relate to the exact same object. These alternate identifiers tend to be internal IDs or local accession numbers, which are not necessarily globally unique, but may provide additional context. For instance, the InChI for data relating to specific chemical compound:

<alternateIdentifiers>
	<alternateIdentifier alternateIdentifierType="InChI">InChI=1S/C6H10NO2.2H2O/c1-3-4(2)6(9)7-5(3)8;;/h3-4H,1-2H3,(H2,7,8,9);2*1H2/t3-,4-;;/m0../s1</alternateIdentifier>
  <alternateIdentifier alternateIdentifierType="InChIKey">UQPNWVSPCKRRTR-RGVONZFCSA-N</alternateIdentifier>
  <alternateIdentifier alternateIdentifierType="Handle">10042/132754</alternateIdentifier>
</alternateIdentifiers>

For more on how to integrate DataCite in to you repository, see Examples of DataCite integration.

Across the EBI, many services have multiple identifiers, which can reflect the evolution of a resource. A small group of services have DOIs alongside original accession numbers, although practice is not widespread.

Linking creators and contributors

The metadata for an item can also be used to link to the identity of its creators and contributors. The DataCite metadata also supports name identifiers. The example below is for data held in PANGAEA.

<creator>
  <creatorName>Dengler, Marcus</creatorName>
  <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org">0000-0001-5993-9088</nameIdentifier>
</creator>

In many cases, contributors are individuals, but the vocabulary for the Contributor field allows exploration of organisational relationships. Contributors and Creators can be organisations, but there are contributorTypes that relate specifically to organisations. These are:

  • Distributor
  • Funder
  • HostingInstitution
  • RegistrationAgency
  • RegistrationAuthority
  • ResearchGroup

The question of appropriate identifiers for organisations is an open one, but there are examples of organisation identifiers in DataCite:

<creators>
    <creator>
      <creatorName>UCD Archives</creatorName>
      <nameIdentifier nameIdentifierScheme="ISNI">0000000404462544</nameIdentifier>
    </creator>
</creators>

ORCID has been using ISNIs as organisation identifiers for some time, and so it is possible to link datasets without organisational information to relevant organisations via the ORCID records of its individual creators and contributors. For instance, for our ORCID example in PANGAEA above, we can see that the creator is employed by the Helmholtz Centre for Ocean Research Kiel and so the data is related to that organisation (ISNI:0000-0000-9056-9663).

Versions of identities

There are disciplines where people may wish to create separate identities and have separate identifiers and records for them. In such cases, ORCID does not link between identifiers. However, unintentional duplicate ORCID iDs do occur. To deal with this, ORCID depreciates one iD and ensures that it points to the primary record.This ensures the persistence of the depreciated ORCID iD, but allows users to navigate to the appropriate record.

๐Ÿ‘

Content for this article was drawn from:

Fenner, Martin et al.. (2016). THOR: Conceptual Model of Persistent Identifier Linking. Zenodo. http://doi.org/10.5281/zenodo.48705