Software - Helmholtz-Zentrum für Umweltforschung UFZ

uap – Robuste, konsistente, und reproduzierbare Datenanalyse

Autoren

Christoph Kämpf, Michael Specht, Sven-Holger Puppel, Alexander Scholz, Gero Doose, Kristin Reiche, Sebastian Canzler, Jana Schor, Jörg Hackermüller

Beschreibung

uap ist ein Workflow Management Tool welches für die Kontrolle und Wiederholbarkeit der Auswertung von großen Datensätzen benutzt werden kann. Es ermöglicht eine robuste, konsistente und reproduzierbare Datenanalyse. uap kapselt (bioinformatische) Programme und behandelt den Datenfluss und dessen Prozessierung während einer Auswertung. Der Benutzer kann vordefinierte oder selbst erstellte Schritte benutzen um eine benutzerdefinierte Analyse durchzuführen. uap wird bevorzugt für Hochdurchsatzdaten aus Sequenzierexperimenten verwendet, ist jedoch aufgrund seiner plug-in Architektur problemlos in anderen Projekten benutzbar.

uap unterstützt die Grid Engines SLURM und UGE für HPC-Cluster Anbindungen.

Wichtige Links

Software download	https://github.com/yigbt/uap
Documentation	https://uap.readthedocs.io/en/master/index.html
Docker build's context	https://github.com/yigbt/uap-docker
Travis CI	https://travis-ci.org/yigbt/uap
Singularity Container	https://cloud.sylabs.io/library/bioinf_ufz/uap/uap.sif

MOD-Finder - Tool zur Suche von toxikologischen Multi-Omics Datensätzen

Autoren

Sebastian Canzler, Jörg Hackermüller, Jana Schor

Beschreibung

Es ist eine sehr langwierige und mühsame Aufgabe, Omics-Datensätze verschiedener molekularer Ebenen, wie z.B. Transkriptom, Proteom und Metabolom, zu sammeln, um sie in einer Multi-Omics-Datenanalyse zu verwenden. Dies liegt hauptsächlich an einer großen Anzahl verschiedener Datenbanken. Aufgrund ihrer nicht einheitlichen Beschaffenheit führt dies zu einem relativ hohen Maß manueller Abfragen.

Um diese Hindernisse zu überwinden, haben wir den Multi-Omics-Datensatz-Finder (MOD-Finder) im Rahmen des CEFIC LRI-C5-XomeTox-Projekts entwickelt. MOD-Finder ist einer R-Shiny-App um effizient zusammengesetzte Omics-Datensätze auf automatisierte Weise zu suchen. Dabei werden mehrere öffentlich verfügbare Datenbanken automatisch nach Datensätzen in Bezug auf eine vom Benutzer angegebene Chemikalie oder Toxin abgefragt. Die Ergebnisse werden in einer einfachen Datentabelle dargestellt. Darüber hinaus werden Chemikalien bezogene Informationen wie IDs, Synonyme, Beschreibung sowie Visualisierungen bezüglich der Chemikalien-Gen Wechselwirkungen oder der Anreicherung von KEGG-Signalwegen bereitgestellt. Der MOD-Finder ist als benutzerfreundlicher Webservice konzipiert.

Wichtige Links

Webservice	https://web.app.ufz.de/mod_finder
Source Code	https://github.com/yigbt/MOD-Finder

multiGSEA: Eine GSEA-basierte Pathway Enrichment Methode für Multi-Omics Daten

Autoren

Sebastian Canzler, Jörg Hackermüller

Beschreibung

Gaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods. A plethora of different tools and algorithms have been developed so far. Among those, the gene set enrichment analysis (GSEA) proved to control both type I and II errors well.

In recent years the call for a combined analysis of multiple omics layer became prominent, giving rise to a few multi-omics enrichment tools. Each of which has its own drawbacks and restrictions regarding its universal application.

Here, we present the multiGSEA package aiding to calculate a combined GSEA-based pathway enrichment on multiple omics layer. The package queries 8 different pathway databases and relies on the robust GSEA algorithm for a single-omics enrichment analysis. In a final step, those scores will be combined to create a robust composite multi-omics pathway enrichment measure. multiGSEA supports 11 different organisms and includes a comprehensive mapping of transcripts, proteins, and metabolite IDs.

Important links

Software download	https://github.com/yigbt/multiGSEA
Dokumentation	http://bioconductor.org/packages/release/bioc/vignettes/multiGSEA/inst/doc/multiGSEA.html
Bioconductor package	https://bioconductor.org/packages/multiGSEA/
Citation	Sebastian Canzler, Jörg Hackermüller. multiGSEA: A GSEA-based pathway enrichment analysis for multi-omics data. BMC Bioinformatics 21, 561 (2020). https://doi.org/10.1186/s12859-020-03910-x

ProteinPrompt: Vorhersage von Protein-Protein Interaktionen

Autoren

Sebastian Canzler, Markus Fischer, David Ulbricht, Nikola Ristic, Peter W. Hildebrand, René Staritzbichler

Beschreibung

ProteinPrompt is a webserver and stand-alone tool that uses machine-learning algorithms to calculate specific, currently unknown protein-protein interactions by means of the amino acid sequence alone. It's designed to quickly and reliably predict contacts based on an input sequence in order to scan large sequence libraries for potential binding partners, with the goal to accelerate and assure the quality of the laborious process of drug target identification.

Wichtige Links

Webserver	https://proteinformatics.uni-leipzig.de/protein_prompt/
Gitlab	https://gitlab.hzdr.de/proteinprompt/ProteinPrompt
Docker Container	https://gitlab.hzdr.de/proteinprompt/ProteinPrompt/container_registry/4590
Citation	Sebastian Canzler, David Ulbricht, Markus Fischer, Nikola Ristic, Peter W. Hildebrand, René Staritzbichler. bioRxiv, https://doi.org/10.1101/2021.09.03.458859

deepFPlearn - AI for predicting chemical-effect associations at the universe level

Authors

Jana Schor, Patrick Scheibe, Matthias Bernt, Wibke Busch, Chih Lai, Jörg Hackermüller

Summary

deepFPlearn is an AI tool that predicts associations between chemicals and gene targets. Based on their molecular structure, chemicals often interfere with biomolecules, leading to adverse effects in the respective organism. deepFPlearn is a ready-to-use deep learning (DL) tool that combines feature reduction with a deep autoencoder and subsequent classification with a deep feed-forward neural network. We decreased the discrepancy between large descriptor size (molecular structure of a chemical) and the limited amount of labeled training data by i) using a simple representation of the chemical's structure – the binary fingerprint; and ii) by applying feature compression prior to the classification to an effect. We provide trained models for endocrine disruption (ED), i.e., chemicals that mimic or interfere with the body's hormones. However, the tool is highly flexible and trainable with other datasets.

Availability

Code repository	https://github.com/yigbt/deepFPlearn
Preprint	Jana Schor, Patrick Scheibe, Matthias Bernt, Wibke Busch, Chih Lai, Jörg Hackermüller AI for predicting chemical-effect associations at the universe level - deepFPlearn bioRxiv 2021.06.24.449697; doi: https://doi.org/10.1101/2021.06.24.449697

Container für die Analyse von Transkriptomdaten

Beschreibung

We built a docker container specifically designed for transcriptomics data analysis. We utilize the rocker/verse container and extend them by several R packages from CRAN and Bioconductor to ensure a reproducible working environment.

Within the container, a rstudio-server is running and enables remote access through the webbrowser.

Verfügbarkeit

Download the docker container from DockerHub: https://hub.docker.com/r/boll3/rocker_transcriptomics

Autor

Sebastian Canzler

rocker/verse

The rocker project offers version-stable rocker images with rstudio server. The particular rocker/verse images are extended by tidyverse packages as well as tex and publishing-related packages.

Current rocker/verse version: 4.1.0

Verwendung

How to use the docker container is nicely described in the rocker manual.

Zusätzliche R Pakete

In order to be able to run transcriptomics analysis, we extended the rocker/verse container by several R packages from CRAN and Bioconductor.

Plotting and visuals

EnhancedVolcano
karyoploteR
enrichplot

Differential gene expression analysis

DESeq2
IHW
sva
RUVSeq

Functional characterization

fgsea
multiGSEA
clusterProfiler
EGSEA

Annotation

org.Rn.eg.db
org.Hs.eg.db
org.Mm.eg.db
org.Dr.eg.db
biomaRt
AnnotationHub
metaboliteIDmapping
BSgenome.Rnorvegicus.UCSC.rn6

CRAN packages

Rcpp
BiocParallel
hexbin
apeglm
ashr
glmpca
pheatmap
eulerr
PoiClaClu
msigdbr
gtools
DT
proj4
WGCNA
msigdbr
bookdown
gridExtra
xtable
ggnewscale
ggupset
ggridges

Container für die Analyse von Multi-Omics Daten

Beschreibung

Here, we published a docker container specifically designed for
multi-omics data analysis. We utilize the rocker/verse container
and extend them by several R packages from `CRAN` and `Bioconductor` to
ensure a reproducible working environment.

Verfügbarkeit

Download the docker container from DockerHub: https://hub.docker.com/r/boll3/rocker_multiomics

Autor

Sebastian Canzler

rocker/verse

The rocker project offers
version-stable rocker images with rstudio server. The particular
rocker/verse images are
extended by tidyverse packages as well as tex and publishing-related
packages.

Current rocker/verse version: 4.1.2

Verwendung

How to use the docker container is nicely described in the rocker manual.

Zusätzliche R Pakete

In order to be able to run multi-omics analysis, we extended the
rocker/verse container by several R packages from `CRAN` and
`Bioconductor`.

Multi-omics analysis

MOFA2
mixOmics

Plotting and visuals

EnhancedVolcano
enrichplot

Tools for single-omics analysis and data preparation

DESeq2
limma
DEP

Functional characterization

fgsea
multiGSEA
clusterProfiler
EGSEA

Annotation

biomaRt
AnnotationDbi
AnnotationHub
org.Rn.eg.db
org.Hs.eg.db
org.Mm.eg.db
org.Dr.eg.db
metaboliteIDmapping
BSgenome.Rnorvegicus.UCSC.rn6

CRAN packages

Rcpp
BiocParallel
hexbin
eulerr
pheatmap
msigdbr
gtools
DT
proj4
bookdown
gridExtra
xtable
ggnewscale
ggupset
ggridges
reticulate

metaboliteIDmapping R Paket

Beschreibung

The R package 'metaboliteIDmapping' provides a comprehensive mapping table of nine different Metabolite ID formats and their common name. The data has been collected and merged from four publicly available source, including HMDB, Comptox Dashboard, ChEBI, and the graphite Bioconductor R package.

Verfügbarkeit

Bioconductor package	https://bioconductor.org/packages/metaboliteIDmapping/
Documentation	http://bioconductor.org/packages/release/data/annotation/vignettes/metaboliteIDmapping/inst/doc/metaboliteIDmapping.html
Github	https://github.com/yigbt/metaboliteIDmapping

Autor

Sebastian Canzler

Programme

uap – Robuste, konsistente, und reproduzierbare Datenanalyse

Autoren

Beschreibung

Wichtige Links

MOD-Finder - Tool zur Suche von toxikologischen Multi-Omics Datensätzen

Autoren

Beschreibung

Wichtige Links

multiGSEA: Eine GSEA-basierte Pathway Enrichment Methode für Multi-Omics Daten

Autoren

Beschreibung

Important links

ProteinPrompt: Vorhersage von Protein-Protein Interaktionen

Autoren

Beschreibung

Wichtige Links

deepFPlearn - AI for predicting chemical-effect associations at the universe level

Authors

Summary

Availability

Container für reproduzierbare Wissenschaft

Container für die Analyse von Transkriptomdaten

Beschreibung

Verfügbarkeit

Autor

rocker/verse

Verwendung

Zusätzliche R Pakete

Container für die Analyse von Multi-Omics Daten

Beschreibung

Verfügbarkeit

Autor

rocker/verse

Verwendung

Zusätzliche R Pakete

Software Pakete

metaboliteIDmapping R Paket

Beschreibung

Verfügbarkeit

Autor

Toolbox Toxicokinetic Modeling