Find and use data
Data are valuable. Not only is their acquisition usually time-consuming and cost-intensive (experimental data, derived data) - they are also often not reproducible or only with difficulty (observational data). Therefore, not least for reasons of research economics, increasing importance is attached to the re-use of data.
Data re-use is when already collected research data are used for later research in the context of other projects and / or other questions.
FAQ
Not least due to the requirements and recommendations of funders, publishers and institutions for making data accessible, research data are increasingly available for subsequent use. In order to find suitable research data for one's own research area, relevant offerings from one's own field often provide the first point of contact. These can be institutional or subject-specific repositories or data journals. Repositories can be searched by subject area using the Repository Finder. A - far from exhaustive - list of data journals can be found here.
In addition, it is also possible to search dataacross multiple repositories using generic search services . A major drawback of these search services is that they often cannot adequately map the detailed metadata schemas of their sources. In addition, the respective metadata differ greatly in terms of what they identify, i.e. individual data, data sets or collections.
The best-known portals include:
Retrieves metadata from repositories and databases via OAI-PMH. Research data can be found via the document type "primary data".
Searches metadata from various sources such as CLARIN or Global GBIF.
Searches metadata of information objects, including research data (object type 'Dataset'), which are registered with DOIs at DataCite. The metadata is also partly queried by the other two services.
Contains freely accessible research results from EU-funded projects.
- Google Dataset Search (proprietary!)
- gesisDataSearch - search of social and economic research data in data repositories and metadata services
- VerbundFDB - Search of studies, research data and instruments of empirical educational research
The respective rights (licenses, user agreements, if applicable) are binding for the subsequent use itself. Among other things, they can specify who may use the data for what purpose and for how long.
In order to be able to reuse research data, the quality of the data is crucial above all. Data quality in research data management includes the following areas in particular:
- Data format (special storage formats of scientific data, such as vector format, raster format, and property format, etc.).
- Data completeness and data correctness
Leibniz Data Manager is a free prototype, which is exemplary for similar tools:
Leibniz Data Manager allows visualization of different research data formats, enabling 'screening' of datasets for their potential usefulness. As a visualization and management tool, it supports the management and access to heterogeneous research data publications, and thus researchers in selecting relevant datasets for their respective disciplines.
Currently, a prototype of the Leibniz Data Manager is available and offers numerous functions for the visualization of research data.
In order to adequately document the (subsequent) use of one's own and external research data in the sense of good scientific practice, correct data citation is essential.
In the case of third-party data, this also acknowledges the scientific achievement of its 'originator'. As with the citation of other publications, the conventions for citing data may differ formally. In terms of content, however, they are united by the requirement of unambiguous identifiability of the data source. The FORCE11 Data Citation Synthesis Group has developed recommendations for data c itation. According to them, a complete data citation includes
Author(s), year, title of research data, data repository or archive, version, worldwide Persistent Identifier.
Other optional details that may be useful as part of a citation include Edition, Feature name and URI, Resource type, Publisher, Unique numeric fingerprint (UNF), and Location (see Alex Ball & Monica Duke (2015). How to Cite Datasets and Link to Publications).