How to preserve and share data in data repositories?
Preservation and sharing of data in a data repository
A data repository is an online platform that is used to:
- Publish completed datasets.
- Share datasets externally with different access levels and reuse conditions..
- Preserve datasets in the long term.
A data repository is database infrastructure that
- Compiles data from (many) different providers/researchers.
- Manages data, regularly in line with the FAIR data principles.
- Gives access to data and associated metadata and documentation.
Personal websites and databases as well as cloud storage services (Dropbox, Google Drive, etc.) are not considered repositories.
How to select a suitable data repository?
There are hundreds of data repositories or archives to choose from. Keep in mind, however, that not all repositories are equivalent. Some repositories focus more on disseminating and making your data visible than on ensuring their preservation in the long term.
- Check the list of repositories recommended by your journal/publisher. Many journals and publishers with data sharing policies recommend, and for some data types even require, the use of specific repositories. For example, see the list of recommended repositories from Springer Nature or PLOS. Also funders might recommend data repositories (Open Research Europe).
- Check best practices within your community by reaching out to peers, by reading data availability statements in publications or by identifying research data management initiatives (e.g. Research Data Alliance (RDA)) in your scientific domain.
- Check data portals that bring together data from different data repositories (e.g. EOSC Portal).
- Look for information about a specific repository or identify a repository suitable for your research data via the re3data.org and FAIRsharing.org registries.
- Select a general-purpose repository, such as Zenodo or Open Science Framework, if no established repository exists for your research domain.
- Does the repository match your data needs (e.g. in terms of accepted data types and formats, access levels, licenses, legal requirements for data protection…)? Read the data submission guidelines (see below) on the website of the repository itself to check the scope of the repository.
- Does it charge for its services?
- Does it have an explicit commitment to long-term preservation?
- Does it provide a landing page for each dataset, with publicly available metadata?
- Does it assign persistent and globally unique identifiers?
- Does it provide clarity about access levels and conditions?
- Does it provide information about usage licenses?
- Is it a trusted repository?
- Is it certified?
- Is it community-based, or a commercial solution?
How to prepare data for preservation and sharing in a data repository?
If you have identified a suitable data repository or archive, check in advance what the data submission guidelines are, so you can adequately prepare your data for deposit. The data repository will have guidelines on how to build a data package.
- Accepted data formats.
- Required documentation.
- Required and recommended metadata.
- Recommended controlled vocabularies or ontologies.
- Access level of the data.
- Permission and conditions for re-use through licensing.
- For long term preservation, additional actions might be needed to ensure that data remains usable.
Last modified June 6, 2023, 4:46 p.m.