Today, research data is widely available in digital form, datasets are easily accessible online and the dataset creator should consider it advantageous as this leads to greater uptake. However, the downside is that digital datasets can be easily copied, duplicated in multiple places, and re-published through more than one repository or service. Particularly with web services, mirroring resources is a common practice, especially in online GIS packages and dashboards. ‘Copy WMS link’ buttons are common, but these often only provide access to the service endpoints: any information about the owner, licence, accreditation, citation etc is not carried with the data. Increasingly poor practices in republication and duplication are leading to exactly the same versions of data/metadata being available in multiple places: some replications are assigned new DOIs, without cross-referencing the original DOI.
Increasingly funders ask for information about usage and impact of datasets/data acquisition campaigns they funded. Journal publishers now require that appropriate credit be given to whoever collected, curated and/or preserved the data in a publication. Best practices are currently poorly defined: researchers are raising issues on ethics and asking if we need to rethink data licencing.
There is clearly a need for community agreed documentation of best practices for the identification of data aggregations, data re-publication and mirroring of data to multiple sites. For ethical scientific research there is an urgent need to be able to identify the authoritative or canonical version of a dataset and ensure correct attribution and citation of any data source.
Lesley Wyborn1, Kirsten Elger2, Jens Klump3, Kerstin Lehnert4, Mingfang Wu5
1Australian National University, Australia; 2GeoForschungsZentrum, Germany; 3CSIRO, Australia; 4Columbia University, USA; 5Australian Research Data Commons, Australia