PID: What is a persistent identifier for publications and datasets?

What is a persistent identifier?

A persistent identifier or PID, such as a DOI or Handle, is a permanent, unique reference to a digital object. Not all identifiers ensure persistency and uniqueness like a PID (see examples below). Moreover, when a PID for a digital object is created, descriptions of this object (i.e. metadata) are registered. Therefore, PIDs are essential to generate FAIR research output. 

  • A PID ensures that access to the digital object is persistent. PIDs avoid broken links and difficulties locating a digital object (e.g. a dataset, a publication), even if its web address (URL) changes. A central registry ensures that following the PID will point you to the digital object’s current location.
  • A PID uniquely identifies a digital object. In addition to being persistent, the identifier will often also be globally or universally unique (also referred to as guid or uuid), i.e. ensuring that there are no two identical identifiers that point to different digital objects nor multiple identifiers of the same type that point to the same digital object.

In this research tip, we give examples of PIDs for publications and datasets, as well as some other identifiers related to publications (i.e. journals and books). Besides the PIDs described here, PIDs also exist for identification of, for instance, researchers (ORCID) or research organizations (ROR).

 

What is a Digital Object Identifier (DOI)?

The Digital Object Identifier or DOI is a commonly used identifier for research output.  It is generated by the central registries DataCite or CrossRef.

A DOI always comprises:

  • The prefix ‘10.’
  • 4 or more numbers, identifying the organization that registered the DOI at DataCite or CrossRef.
  • A suffix that identifies the digital object.

Appending a DOI to the resolver system https://doi.org/ takes you to the location or landing page of the digital object in question. An example of a dataset held at the Dryad repository is https://doi.org/10.5061/dryad.4h16331. An example for a publication is https://doi.org/10.1371/journal.pbio.3002114.

 

Alternative (persistent) identifiers

In addition to DOIs, various other (persistent) identifiers exist for both publications and datasets. Also accession numbers are used to uniquely identify digital objects. The type of (persistent) identifier will depend on the organization or platform that creates the identifiers. Some examples of (persistent) identifiers are given below.

Handle

  • Handle is a unique and persistent identifier with a central registry to resolve URLs to the current location.
  • Handles are used for both publications and datasets and are widely adopted in, for instance, the cultural heritage sector and humanities (e.g. CLARIN).
  • Handle identifiers are structured by a prefix issued by the Handle.net registry and a suffix that uniquely identifies the handle under the relevant prefix (e.g. 11509/142).

WoS

  • Web of Science (WoS) id is a unique identifier assigned to publications that are part of the Web of Science collection.
  • An example identifier is WOS:000334976000012, which you will find in the URL of the publication on Web of Science.

arXiv

  • An arXiv id is created by the curated research-sharing platform arXiv which is open to anyone and used for publication of articles.
  • The canonical form of arXiv identifiers is arXiv:YYMM.NNNNN, with 5-digits for the sequence number within the month (since January 2015). An example of an arXiv id is arXiv:1501.00001.

PubMed ID

  • PubMed covers citations of biomedical literature. Citations may include links to full text content from PubMed Central and publisher websites.
  • A PMID or PubMed ID is a unique identifier composed of digits only. An example of a PubMed ID is 26360422.

ISSN

  • An ISSN is an International Standard Serial Number and uniquely refers to a serial publication, like a journal, magazine or newspaper.
  • It is an 8-digit id. An example is ISSN:1360-1385, the ISSN for the journal Trends in Plant Science.
  • In addition to ISSN, eISSN for the electronic (online) version of a serial exists (e.g. eISSN:1878-4372).

ISBN

  • An ISBN or International Standard Book Number is a code referring to a monograph or book. This can refer to a digital or a physical object.
  • It consists of 10 or 13 digits (e.g. 9789463939447).
  • Note that multiple ISBN may exist for the same resource.
  • An ISBN is requested by the publisher of the monograph. Do you want to request an ISBN? Go to the research tip on ISBN for more information.

ARK

  • An Archival Resource Key (ARK) is an identifier scheme developed by the California Digital Library, to identify digital objects in a persistent way.
  • In contrast to other PIDs, ARKs are decentralized (i.e. each organisation that creates ARKs is responsible for content management, hosting, monitoring and forwarding).
  • Moreover, they do not necessarily work with landing pages that point you to the digital object but can direct you directly to the objects. ARKs do not have metadata requirements.
  • An example of an ARK identifier is http://ark.bnf.fr/ark:/12148/btv1b8449691v/f29.

PIDs and accession numbers for Life Sciences data

  • Many databases and repositories in life sciences work with their own identifier system. As these databases are considered as trustworthy data repositories by many instances, their accession numbers are widely adopted and accepted for use in citations.
  • The European Nucleotide Archive (ENA) assigns accession numbers to digital entities at various levels of granularity. Example ENA accession numbers are the PRJEB1788 (BioProject) and LR796137 (Sequence). 
  • The European Genome-Phenome Archive (EGA) assigns IDs to studies and datasets. An example dataset ID is EGAD00000000001.
  • EMBL-EBI’s BioStudies database links to or manages diverse biological data from EMBL-EBI databases as well as data that do not fit in the structured EMBL-EBI databases. Once data are submitted successfully, an accession number is automatically assigned. Example accession numbers are S-BSST7 (BioStudies), S-BIAD712 (BioImage Archive),  E-MTAB-12919 (ArrayExpress).

PIDs in Biblio

DOIs as well as alternative (persistent) identifiers can be used in the Biblio registration process of publications (DOI, arXiv ID, PubMed ID, WoS exports) and datasets (DOI, Handle, ENA, BioProject, BioStudies). An overview of identifiers used in Biblio can be found in the table below.

Identifier type Identifier example Link Digital object

DOI

10.5061/dryad.4h16331 https://doi.org/10.5061/dryad.4h16331 publication, dataset
Handle 11509/142 https://hdl.handle.net/11509/142

publication, dataset

WoS 000334976000012 https://www.webofscience.com/wos/woscc/full-record/WOS:000334976000012 publication
PubMed ID 26360422 https://pubmed.ncbi.nlm.nih.gov/26360422/ publication
arXiv 1501.00001 https://arxiv.org/abs/1501.00001 publication
ENA LR796137 https://www.ebi.ac.uk/ena/browser/view/LR796137 dataset
EGA EGAD00000000001 https://ega-archive.org/datasets/EGAD00000000001 dataset
BioProject PRJEB1788 https://ebi.ac.uk/ena/browser/view/PRJEB1788 dataset
BioStudies S-BSST7 https://www.ebi.ac.uk/biostudies/studies/S-BSST7 dataset

 

How to get a persistent identifier for your research output?

Publications

You can typically get a PID for your publication by submitting them to a publishing venue or repository. The publisher or the repository will create a PID for the publication. The most commonly used PID for publications is the Digital Object Identifier (DOI). In many cases, the DOI will be requested to CrossRef, the main central registry for publications.

Datasets

You can typically get a PID for your dataset by depositing them in a data repository. The most commonly used PID for datasets is the Digital Object Identifier (DOI).

Note that some community-wide accepted databases or repositories do not generate DOIs for the datasets that they manage, but rather make use of alternative PIDs or accession numbers (see examples above).

 

How to cite publications or datasets?

Research data can be cited in the same way as publications. Including the persistent identifier in the reference will ensure that the publication or dataset can be located easily. Whereas a reference to a publication will contain the journal or other publishing venue in which the publication is published, a reference to a dataset will include the data repository. Publisher’s websites and repositories often include the functionality to copy or export a citation that can be used in a reference list.  By properly citing research results, the authors of a publication or creators of a dataset are acknowledged, and citations can be monitored.

Want to know more?

For more information on PIDs and data citation, check out this video. For more information and guidelines on identifier types, check https://rdmkit.elixir-europe.org/identifiers and https://www.pidwijzer.nl/en.

More tips

Translated tip


Last modified April 4, 2024, 4 p.m.