Binary Code and Earth. Source: Gerd Altman,

Digital Science, Research Data Management and Infrastructures

Content overview

Helmholtz Information & Data Science - Platforms for the Digitalisation in Research

Contact: Prof. Sabine Attinger ( )
Link: Helmholtz Platforms

The Helmholtz Association is creating five innovative platforms to digitalize research. To this end, Germany’s largest research organization will invest a total of 35 million euros per year. "One of the greatest challenges of our time is the digital transformation of science, business, and society," says Otmar D. Wiestler, President of the Helmholtz Association. "It opens up unimagined options in almost all areas of life, including innovative forms of work and coexistence, novel platforms for trade and exchange, as well as unique opportunities in all scientific disciplines."

read more…

Ambitious and courageous steps are necessary for Germany to be able to shape and be a part of this development. "The Helmholtz Association has a high level of expertise in the field of information technologies and information processing, and generates enormous amounts of big data in all research fields. It has therefore decided to resolutely press ahead with the digitalization of research," continued Wiestler. "Germany's largest research organization has established four new platforms that together form the new Helmholtz Information & Data Science Framework. The Helmholtz Association will use these platforms to strengthen its six research fields and make an important contribution to keeping Germany at the forefront of world research and innovation."
Each platform is located at one or more Helmholtz Centers and creates an active network that also includes other researchers. Specific funding lines will be established for this purpose. Helmholtz will build the platforms over the next few months and open a wide range of opportunities to interact with other research organizations and universities. The largest German research organization will recruit leading scientists from around the world and initiate future-oriented projects.

The five new platforms are:

  1. Helmholtz Information & Data Science Academy (HIDA): Training young scientists at newly designed graduate schools within a national consortium
    UFZ contact: Lennart Schmidt ( )
    Link: ( )
  2. Helmholtz Artificial Intelligence Cooperation Unit (HAICU/Helmholtz.AI): Artificial intelligence and machine learning
    UFZ contact person: Lennart Schmidt ( )
    Link: ( )
  3. Helmholtz Imaging Platform (HIP): Imaging procedures & analysis methods
    UFZ contact person: Dr. Hendrik Paasche ( )
    Link: ( )
  4. Helmholtz Infrastructure for Federated ICT Services (HIFIS): Basic technologies and services for large scale data-driven research
    UFZ contact person: Norman Ziegner( )
    Link: ( )
  5. Helmholtz Metadata Collaboration Center HMC: Metadata related research and networking platform
    UFZ contact person: Helen Kollai ( )
    Link: ( )

The establishment of the platforms is a future-oriented intermediate result of a long-term, bottom-up process occurring throughout the Association (see „Helmholtz Data Science Incubator). The UFZ mainly shaped the programm and establishment oft he Helmholtz Metadata Collaboration Platform. Furthermore, it is part of the local HAICU unit within the Research Field „Earth and Environment“ in Helmholtz and actively shapes the HIFIS goals.

Note! All platforms offer the possibilities for flexible funding of your digital science projects. If you are interested in news about these possibilitis, please check the webpages or contact

Helmholtz Information & Data Science Incubator Projects

Sparse2Big - Imputation and fusion for large, sparse data

Contact: Dr. Jörg Hackermüller ( )

The pilot project "Sparse2Big" provides methodological and technical basics for the handling of Big Data. The aim is to create really usable "Big Data" from sparsely observed, large data sets by imputation (completion) and robust modelling of the observation processes.

read more…

The project initially concentrates on data sets in single cell genomics, using modern genome sequencing techniques for the analysis of single cells: In this way, researchers receive a "molecular microscope" with a wide range of applications, for example in developmental biology, cancer diagnostics or stem cell therapy. Sparse2Big's innovative techniques will significantly improve observations in single cell genomics and thus in bio-medical research. Building on this, a transfer of these methods to other research areas is already being prepared.

Pilot Lab Exascale Earth System Modelling (PL-EESM)

Contact: Prof. Olaf Kolditz ( )
Link: (in progress)

The Pilot Lab Exascale Earth System Modelling (PL-EESM) explores specific concepts to enable exascale readiness of Earth System models and associated work flows in Earth System science. The work is organized in five collaborative work packages, leveraging co-design between domain and computer scientists to address the computational and data challenges posed by future supercomputers.

read more…

PL-EESM provides a new platform for scientists of the Helmholtz Association to develop scientific and technological concepts for future generation Earth System models and data analysis systems.

Even though extreme events can lead to disruptive changes in society and the environment, current generation models have limited skills par ticularly with respect to the simulation of these events. Reliable quantification of extreme events requires models with unprecedentedly high resolution and timely analysis of huge volumes of observational and simulation data, which drastically increase the demand on computing power as well as data storage and analysis capacities.

At the same time, the unprecedented complexity and heterogeneity of exascale systems, will require new software paradigms for next generation Earth System models as well as fundamentally new concepts for the integration of models and data. Specifically, novel solutions for the parallelisation and scheduling of model components, the handling and staging of huge data volumes and a seamless integration of information management strategies throughout the entire process-value chain from global Earth System simulations to local scale impact models will be developed in PL-EESM. The potential of machine learning to optimize these tasks will be investigated.

At the end of the project, several program libraries and workflows will be available, which provide the basis for the development of next generation Earth System models. PL-EESM will therefore act as incubator for the Joint Lab EESM in Helmholtz`s new research program. It will enhance collaboration among research fields and centres of the Helmholtz Association and it will contribute to positioning the Helmholtz Association as a major player in European flagship activities such as Extreme Earth and other relevant opportunities.

Uncertainty Quantification – From Data to reliable Knowlegde

Contact: Dr. Hendrik Paasche ( )
Link: (in progress)

How will the climate develop, how secure is our energy supply, and what chances does molecular medicine offer?

read more…

The rapidly increasing amount of data offers radically new opportunities to address today’s most pressing questions from the society, science, and economy but also requires novel mathematical and statistical methods to handle them. However, such data and methods are subject to uncertainty, which is often considered as an unavoidable burden in real-world applications.

By employing probabilistic data science techniques, uncertainty can be turned into a valuable source of information and a powerful enrichment of black-box approaches from artificial intelligence. To harness this source of information, in this project we identify common challenges between several Helmholtz use cases and foster translational research at the interface of disciplinary and mathematical research.

Our goal is to enable more reliable knowledge sourcing from data by developing tools and methods within the field of Uncertainty Quantification (UQ) based on the applications.

Current Data Science Projects

AI Hub Sachsen

Contact: Dr. Jan Bumberger ( )

Artificial intelligence is currently finding its way more and more from research to everyday life. Together with participants from science, industry and public administration, the Institute for Applied Computer Science (InfAI) has launched the initiative "KI-Hub - wir bringen KI in die Anwendung" ("AI Hub - we bring AI into application"), which enables Artificial Intelligence to be more strongly integrated into the respective departments.

read more…

Hub means in the figurative sense "center" or "center". Andreas Heinecke, Managing Director of InfAI, defines artificial intelligence as "an imitation of human intelligence". He further explains: "Man cannot be imitated 100% for a long time. Today, however, AI technologies are capable of analyzing large amounts of data, making decisions and supporting people in individual activities, such as analyzing data protection declarations, controlling gas networks or maintaining machines.

The AI Hub Saxony is to simplify processes and make them more efficient through artificial intelligence technologies, especially for companies, research institutions and in public administration. The Hub bundles the competencies of supporters from science and industry and advances the development of Artificial Intelligence. Thus, the Hub also has a special significance for Saxony as a research and business location.

The initiative is supported by: University of Leipzig, HTWK Hochschule für Technik, Wirtschaft und Kultur Leipzig, HHL Leipzig Graduate School of Management, HTW Dresden, Staatsbetrieb sächsische Informatik Dienste, Fraunhofer-Zentrum für Internationales Management und Wissensökonomie, Helmholtz Centre for Environmental Research UFZ, Kompetenzzentrum Mittelstand 4.0 Chemnitz, AOK PLUS, ACOD GmbH, Mitteldeutsche Flughafen AG, IT Sonix GmbH, Avantgarde Labs, SpinLab Accelerator GmbH, Smart Infrastructure Hub Leipzig, Salt Solutions AG, Institute for Applied Computer Science (InfAI) e.V.

Competence Center for Scalable Data Services and Solutions Dresden/Leipzig - ScaDS

Contact: Prof. Olaf Kolditz ( )

The Big Data Competence Center ScaDS Dresden/Leipzig - Competence Center for Scalable Data Services and Solutions is one of two German Big Data Competence Centers that the Federal Ministry of Education and Research (BMBF) is funding since October 2014.

read more…

ScaDS Dresden/Leipzig implements cooperative research on big data technologies and their interdisciplinary application for a wide range of applications in science and industry. After successful 4 years of the first phase, ScaDS Dresden/Leipzig was extended in October 2018 for a second phase of 3 years with the goal of further expansion and long-term continuation.

The research is running at two locations, Dresden and Leipzig, by the partners Dresden University of Technology, Leipzig University, Max Planck Institute for Molecular Cell Biology and Genetics, Leibniz Institute for Ecological Spatial Planning, Helmholtz Center for Environmental Research, Leipzig and the Helmholtz Center Dresden Rossendorf.

The UFZ currently works within ScaDS at following use cases: analysis of mass spectrometry data, parallelisation of scientific software codes, and in-situ visualisation.

Current Research Data Infrastructure Projects

National Research Data Infrastructure Initiatives (NFDI)

Contact: Dr. Jan Bumberger (NFDI4Earth, ), Dr. Mark Frenzel (NFDI4BioDiv, ), Dr. Tobias Schulze (NFDI4Chem, )

The National Research Data Infrastructure (NFDI) is an initiative launched by the Council for Information Infrastructures (RfII), initiated by the Joint Science Conference (GWK) and funded by the Federal Government and the Länder (GWK) to provide the German science system with a "nationwide, distributed and growing network" of services and advisory services for research data management.

read more…

The corresponding federal-state agreement was concluded in November 2018. The agreed funding volume amounts to up to 90 million euros per year in the period 2019-2028.

The NFDI is intended to systematically open up, sustainably secure and make accessible the databases of science and research and to network them (inter-)nationally. It will be set up in a process driven by science as a networked structure of consortia acting on their own initiative. The objectives of the promotion of consortia are: (1) establishment of rules for the standardised handling of data in close feedback with the respective professional community, (2) development of cross-disciplinary metadata standards, (3) development of reliable and interoperable data management measures and services tailored to the needs of the professional community, (4) increasing the reusability of existing data, even beyond the boundaries of disciplines, (5) connection and networking with partners in foreign scientific systems who are competent in the field of research data management, and (6) collaboration in the development and establishment of generic, cross-consortium services and standards for research data management.

The UFZ is enganged in the consortia of Earth System Sciences, Biodiversity and Chemistry.
The binding declarations of intent for the 2019 proposal round and the non-binding declarations of intent for 2020 and 2021 are listed on a separate page at the DFG: NFDI Consortia Declarations

Final decision on the funding of consorita will be made beginning of 2020.

Helmholtz Research Field Earth and Environment - Hub Terra

Contact: Dr. Jan Bumberger ( ), Thomas Schnicke ( thomas. )
Link: in progress

Helmholtz was setting a new focus in 2019 for strengthen the development of interoperable research data infrastructures of the Helmholtz Research Field Earth and Environment . The research field established a data hub inititative to fulfill the goals and which is divided into three sub-hubs for the marine, atmospheric, and terrestrial data infrastructures. The hubs support and shape the data management strategies of the current and next Helmholtz research program. The data hub for terrestrial data is coordinated by the UFZ.

read more…

The historically separated data repositories of marine, terrestrial and atmospheric research, including their cross-sectional fields in climate and biodiversity research, will be merged into an open, networked information infrastructure. This is an essential step towards integrated Earth system knowledge for science and society. As part of the use of Pact funds, measures will be taken over the next two years to harmonise metadata and their collection, data curation, data flows and data management methods in the research area. The basis for this is the identification and subsequent expansion of existing, already functioning data infrastructures for the entire research area.

The focus will be on the following areas: (1) citable data publication (DOI), (2) digitized sample management (e.g. International Geo Sample Number IGSN), (3) interoperable sensor metadata for uniform management of sensors and sensor data, (4) making metadata available for data in interoperable form. The long-term vision aims at the cooperation of the centres in the field of research, distributed over three hubs. These hubs jointly establish a nucleus for research data management, which not only offers jointly structured and harmonized interfaces for communication with national and international initiatives (NFDI, EOSC), but also enables thematic sections and portal views.

Findable, Accessible, Interoperable, and Re-usabel – FAIR Data Management Projects


Contact: Dr. Robert Günther ( )

FAIRsFAIR - Fostering Fair Data Practices in Europe - aims to supply practical solutions for the use of the FAIR data principles throughout the research data life cycle. Emphasis is on fostering FAIR data culture and the uptake of good practices in making data FAIR.

read more…

 FAIRsFAIR will play a key role in the development of global standards for FAIR certification of repositories and the data within them contributing to those policies and practices that will turn the EOSC programme into a functioning infrastructure. In the end, FAIRsFAIR will provide a platform for using and implementing the FAIR principles in the day to day work of European research data providers and repositories. FAIRsFAIR will also deliver essential FAIR dimensions of the Rules of Participation (RoP) and regulatory compliance for participation in the EOSC.

The EOSC governance structure will use these FAIR aligned RoPs to establish whether components of the infrastructure function in a FAIR manner.The UFZ is involved into the project to develop interoperable layers between its data repository and repositories of other Helmholtz Centers (e.g. GFZ).


Contact: Dr. Maren Göhler ( ), Thomas Schnicke ( )

GO FAIR is a bottom-up, stakeholder-driven and self-governed initiative that aims to implement the FAIR data principles, making data Findable, Accessible, Interoperable and Reusable. It offers an open and inclusive ecosystem for individuals, institutions and organisations working together through Implementation Networks (INs). The INs are active in three activity pillars: GO CHANGE, GO TRAIN and GO BUILD. The UFZ is involved into the IN GO BUILD since 2017 and and incorporates the acquired expertise into the development of its data management services.

Linked Open Data

Contact: Dr. Jörg Hackermüller ( ), Thomas Schnicke ( )
Link: in progress

Various scientific questions at the UFZ require the integration of complex data as well as heterogeneously distributed data sources. Typically, this integration problem is solved by transferring entire data sets and setting up a local database that integrates these data sets.

read more…

To answer a research question, often only a fraction of the integrated and transferred data is needed. With the increasing growth of many data sets, this approach becomes inefficient and for practical reasons, the number of integrated data sources is severely limited. Linked Open Data (LOD) extends the concept of the World Wide Web from linked human-readable web pages to linked machine-readable data sets.

By encoding the data in RDF (Resource Description Framework), using ontologies and the SPARQL query language based on them, data integration tasks can be performed by a distributed query of multiple linked data sources.

Aims of the pilot project are: (1) to be able to estimate the effort for a broader use and application of LOD/Semantic Web approaches at the UFZ, (2) to evaluate the perspectives for the integration of UFZ data with public data via LOD, (3) to be able to refer to experience in LOD for future tenders, for example from the Data Science Incubator, and (4) and also the possibility of integrating data from different subject areas via LOD. 

Current Research Data Management Projects

Research Data Management in Saxony - SaxFDM

Contact: Dr. Robert Günther ( )

Data are an essential basis of research. The handling of digital research data - in its various forms, with the increasing speed of its generation as well as the complexity of management and analysis - places high demands on scientists.

read more…

Support and advice, as well as technical infrastructures and services, are urgently needed so that researchers can concentrate on their subject-specific questions. SaxFDM is an initiative of Saxon universities and research institutions for the networking, cooperation and coordination of activities related to research data management.

With the initiative for a National Research Data Infrastructure (NFDI), the Federal Government promotes the exchange and re-use of research data as well as the establishment of a state-of-the-art infrastructure for the management, processing and analysis of research data. SaxFDM will promote and coordinate Saxon activities at NFDI.

Helmholtz Open Science Working Group

Contact: Thomas Schnicke ( )

The term open science denotes a cultural shift in the scholarly way of working and communication. Computer supported working and digital communication enable a more effective and more open exchange of information within academia and foster the transfer of the results into society.

read more…

The development of open science in the research areas of the Helmholtz Association is at different stages, depending on the discipline and publication culture. It is the concern of the Open Science Working Group, in close cooperation with the Helmholtz Open Science Coordination Office², to support researchers in developing lines of orientation. The Helmholtz Association is called upon to help shaping the ongoing cultural shift from closed to open in accordance with academia.

The Open Science Working Group makes the following fields of action the focus of its work: (1) open access - access to and reuse of textual publications, (2) open research data - access to and reuse of research data, (3) open research software - access to and reuse of research software, and (4) national and international networking on open science.