Products and Services
The RDM Team is developing and operating tools and (software-) services for all parts of the data management lifecycle at the UFZ. An ongoing activity is the modernization and modularization of the infrastructure for acquisition, processing and use of research data. The objectives include:
- Support of FAIR principles
- Advance digitalisation and data science methods
- Use of modern technologies and their flexible further development
- Integration of established applications
- Automation of data collection, processing and visualisation processes
- Assignment of Digital Object Identifiers (DOI) to research data deposited in the archiving component of the Data Management Portal (available internally only)
And in addition to the core products and specific services listed below, we also offer training and support in general. For RDM basic knowledge and our RDM trainings please check our RDM Guidelines.
Cross-domain products
These products are domain-agnostic and can be used by scientists and technicians across domains:
The DataHub of the Research Field Earth and Environment is a joint initiative of all centers of the Helmholtz Association participating in the research program 'Changing Earth'. Three SubHubs (ATMO, MARE and TERRA) assigned to the compartments of the Earth system jointly form the DataHub.
The Data Management Portal (DMP - available internally only) offers the employees of the UFZ the possibility to manage research data, to ensure its quality and to describe it comprehensively with metadata. Operation and functionality are oriented to the needs of data collectors. Data projects are used for structuring. They combine data sets of different data types and allow the assignment of access and editing rights. The data collected in the Data Management Portal can be researched in the Data Investigation Portal (DRP) and presented publicly or made available as required.
The Data Management Portal supports the administration of the following types of data:
- Time series data
- Sample and analysis data
- Field management data
- File-based archive data.
User manuals and further information can be found in the RDM guidelines.
The Data Investigation Portal (DRP) provides the opportunity to publicly access the administered data in the Data Management Portal and search them. The presentation is here limited to metadata and non-restricted information. DRP users can thus gain an overview of the data sets and, if necessary, contact the author to gain access to the data.
Quality Control of numerical data is a profoundly knowledge- and experience-based activity. Finding a robust setup is typically a time consuming and dynamic endeavor, even for an experienced data expert.
Our System for automated Quality Control (SaQC) addresses the iterative and explorative characteristics of quality control with its extensive setup and configuration possibilities and a python based extension language. Beneath its user interfaces, SaQC is highly customizable and extensible.
SaQC is an RDM Open Source project publicly available on our GitLab page for SaQC. The documentation can be found here.
Cite SaQC using: https://doi.org/10.5281/zenodo.5888547 or https://doi.org/10.1016/j.envsoft.2023.105809
Domain specific data management workflows and infrastructures
These products and services are intended to complement the core RDM infrastructure and enable the integration of data from different areas of environmental research.
The field of earth system sciences relies heavily on the collection, analysis, and interpretation of data to understand complex spatiotemporal environment processes and predict future trends. Time series data, which captures measurements or observations at regular intervals over time, plays a crucial role in elucidating patterns, detecting changes, and informing decision-making in various environmental domains. time.IO - the time series management system is a well-designed data infrastructure that enables efficient data collection through automated sensing technologies, standardized data exchange protocols, and quality control procedures. Moreover, it facilitates the integration of data from diverse sources into distributed data infrastructures on national and/or continental scales and advances the dissemination of data to a wide range of stakeholders.
Cite time.IO by using: https://doi.org/10.5281/zenodo.8354840
SaQC is described here.
BioMe focuses on the development of a new modular web platform that supports the operation of citizen science projects in the field of biodiversity.
There are already many projects of this kind that BioMe aims to technically modernize and make more attractive to the participating community, for example:
- TMD - Butterfly Monitoring Germany
- BiolFlor - Database of biological-ecological Characteristics of the Flora of Germany
- LEGATO - Rice Ecosystem Services
- ALARM - Assessing LArge scale Risks for biodiversity with tested Methods
BioMe is a contribution to NFDI4Biodiversity.
Cite BioMe using: https://doi.org/10.5281/zenodo.11190783
In this project we aim to improve the existing geodata infrastructure at UFZ. In addition to the already established services that mainly use ESRI products, we want to refine the capabilities and promote FAIR management of geodata with an open source software stack, e.g.
- GeoNetwork,
- Thredds,
- GeoNode,
- GeoServer,
- MinIO,
- WIGO.
This includes the provision of a state-of-the-art storage infrastructure, tools to support metadata management and cataloging, and automated workflows for data processing, visualization, and publishing. E.g. a use case for data visualization:
Cite spatial.IO using: https://doi.org/10.5281/zenodo.12663321Chromeleon is a chromatography data system software which combines data collection and processing for measuring devices of different manufacturers. We provide Chromeleon software as a service and aim to simplify measurement data backup and archiving, update and upgrade procedures.
An electronic lab notebook (also known as electronic laboratory notebook, or ELN) is a software to replace paper laboratory notebooks to document research, experiments, and procedures performed in a laboratory.
- eLabFTW Instance at UFZ (available internally only)
Grafana is a new software solution for visualizing time series data with dashboards. It has many different types of charts and it can also be used to provide public access to selected data. At UFZ, we provide Grafana as a service that can be used to display data of the logger component of the Data Management Portal and it can also be used in combination with Postgres databases.
With INTOB-DB we designed a database for recording toxicological effects of various chemicals on organisms. An efficiently designed web application supports the planning and execution of experiments. The centrally and schematically stored data can be further used by analysis tools, machine learning and pattern recognition.
KNIME Analytics Platform is a open source software for creating data science workflows. KNIME Server is the enterprise software for team-based collaboration, automation, management, and deployment of data science workflows as analytical applications and services.
- KNIME server at UFZ (available internally only)
OMERO (Open Microscopy Environment Remote Objects) is an open source client/server system written in Java for visualizing, managing, and annotating microscope images and metadata. Together with the departments phyDiv, BIOTOX and ZELLTOX as well as the iCyt platform we established workflows for data acquisition by high-content, high-throughput microscopes, storage of image data and metadata in the repository and usage of the (meta)data in data analytic workflows based on KNIME and other machine learning tool chains.
The development of the Sensor Management System (SMS) is a collaborative effort taken by UFZ and the German Research Centre for Geosciences (GFZ). The system includes a web client, a RESTful API service for managing sensor metadata and a web application and API to manage controlled vocabularies for the sensor metadata management system based on modified ODM2 schema.
Cite SMS using: https://doi.org/10.5281/zenodo.13329926
Software and workflows for data pipelines
These tools and services focus on supporting quality control and assurance for data that will be integrated into the broader RDM infrastructure at UFZ and beyond.
For the main field observatories, e.g. TERENO, MOSES, and other observation and monitoring facilities of the UFZ, the RDM team has built quality control piplines that address the individual requirements and data flows of each project. Examples are pipelines for, e.g.:
- Global Change Experimental Facility (GCEF)
- Talsperren Observatorium Rappbode (TOR) (german only)
- Data steams for observatories of the department of Computational Hydrosystems
- Cosmic-Ray Neutron Sensing (CRNS)
In parallel to the development of SaQC, which enables quality control by facilitating the implementation of deterministic tests, the RDM team also explores the usability of machine-learning (ML) algorithms to perform automatic quality control (QC) of data. Soil moisture data from several UFZ observatories are utilized for this use case.