The GEOROC database (Geochemistry of Rocks of the Oceans and Continents) was born in 1999 at the Max Planck Institute in Mainz (Germany) under the leadership of A. W. Hofmann and B. Sarbas. For 25 years, GEOROC has served the geochemical community and now has more than 685000+ sample records for whole rock, glass and mineral analyses. Since 2021 the database has been curated and further developed as part of the DIGIS project from its new home at the University of Göttingen, and now also offers a data repository service.
In the era of AI, chatbots and scraping algorithms, geochemical data can be scraped from the internet into un-curated piles and fed to generalized data analysis models. In this context, the greatest strength of our well-curated expert reviewed data products that can give users the necessary critical metadata to scrutinize analytical data and their quality for tracing geochemical transport processes on global to micron scales. In this talk, we dive deep into the lessons we have learned and the tools we are developing using GEOROC as a case study: Where do errors occur during compositional data publication, how we can identify the type and source of errors? Quality control of long-tail analytical data is of general concern, and our approaches may be applicable to other fields of Earth System Science as well.
Keywords: Database management, quality assessment, geochemistry, metadata, Machine Learning and AI.