Tools

Authors

Christoph Kämpf, Michael Specht, Sven-Holger Puppel, Alexander Scholz, Gero Doose, Kristin Reiche, Jana Schor, Jörg Hackermüller

Summary

uap executes, controls and keeps track of the analysis of large data sets. It enables users to perform robust, consistent, and reproducible data analysis. uap encapsulates the usage of (bioinformatic) tools and handles data flow and processing during an analysis. Users can use predefined or self-made analysis steps to create custom analysis. Analysis steps encapsulate best practice usages for bioinformatic software tools. uap focuses on the analysis of high-throughput sequencing (HTS) data. But its plugin architecture allows users to add functionality, such that it can be used for any kind of large data analysis.

uap is a command-line tool, implemented in Python. It requires a user-defined configuration file, which describes the analysis, as input.

upa supports grid engines such as SLURM and UGE for connecting to HPC clusters.

Important Links

Software download https://github.com/yigbt/uap
Documentation https://uap.readthedocs.io/en/master/index.html
Docker build's context https://github.com/yigbt/uap-docker
Travis CI https://travis-ci.org/yigbt/uap
Singularity Container https://cloud.sylabs.io/library/bioinf_ufz/uap/uap.sif

Authors

Sebastian Canzler, Jörg Hackermüller, Jana Schor

Summary

It is a highly tedious task to collect omics data sets from different molecular levels such as transcriptome, proteome, and metabolome, to be used in a multi-omics data analysis. This is mainly because of a large amount of potential databases to search in, their non-unified querying system which results in a fairly large amount of manual work.

To surmount these obstacles, we developed the Multi-Omics Data set Finder (MOD-Finder) as part of the CEFIC LRI-C5 XomeTox project, an R Shiny application, to efficiently search for compound-related omics data sets in an automated manner. Therefore, several publicly available databases are automatically queried for data sets with relation to a user specified compound or toxicant. The results are presented in a plain datatable. Additionally, compound-related information such as distinct IDs, synonyms, description, as well as visualizations regarding chemical-gene interactions or KEGG pathway enrichments are provided. The MOD-Finder application works as an easy-to-use webservice.

Important Links

Webservice https://web.app.ufz.de/mod_finder
Source code https://github.com/yigbt/MOD-Finder
Citation Canzler, S, Hackermüller, J, Schor, J (2019):
MOD-Finder: Identify multi-omics data sets related to defined chemical exposure; arxiv.org (preprint);https://doi.org/10.48550/arXiv.1907.06346

Authors

Sebastian Canzler, Jörg Hackermüller

Summary

Gaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods. A plethora of different tools and algorithms have been developed so far. Among those, the gene set enrichment analysis (GSEA) proved to control both type I and II errors well.

In recent years the call for a combined analysis of multiple omics layer became prominent, giving rise to a few multi-omics enrichment tools. Each of which has its own drawbacks and restrictions regarding its universal application.

Here, we present the multiGSEA package aiding to calculate a combined GSEA-based pathway enrichment on multiple omics layer. The package queries 8 different pathway databases and relies on the robust GSEA algorithm for a single-omics enrichment analysis. In a final step, those scores will be combined to create a robust composite multi-omics pathway enrichment measure. multiGSEA supports 11 different organisms and includes a comprehensive mapping of transcripts, proteins, and metabolite IDs.

Important links

Software download https://github.com/yigbt/multiGSEA
Documentation http://bioconductor.org/packages/release/bioc/vignettes/multiGSEA/inst/doc/multiGSEA.html
Bioconductor devel package https://bioconductor.org/packages/devel/bioc/html/multiGSEA.html
Citation Sebastian Canzler, Jörg Hackermüller. multiGSEA: A GSEA-based pathway enrichment analysis for multi-omics data. BMC Bioinformatics 21, 561 (2020). https://doi.org/10.1186/s12859-020-03910-x

Authors

Sebastian Canzler, Markus Fischer, David Ulbricht, Nikola Ristic, Peter W. Hildebrand, René Staritzbichler

Summary

ProteinPrompt is a webserver and stand-alone tool that uses machine-learning algorithms to calculate specific, currently unknown protein-protein interactions by means of the amino acid sequence alone.  It's designed to quickly and reliably predict contacts based on an input sequence in order to scan large sequence libraries for potential binding partners, with the goal to accelerate and assure the quality of the laborious process of drug target identification.

Availability 

Webserver https://proteinformatics.uni-leipzig.de/protein_prompt/
Gitlab https://gitlab.hzdr.de/proteinprompt/ProteinPrompt
Docker Container https://gitlab.hzdr.de/proteinprompt/ProteinPrompt/container_registry/4590
Citation Sebastian Canzler, David Ulbricht, Markus Fischer, Nikola Ristic, Peter W. Hildebrand, René Staritzbichler. bioRxiv, https://doi.org/10.1101/2021.09.03.458859


Authors

Jana Schor, Patrick Scheibe, Matthias Bernt, Wibke Busch, Chih Lai, Jörg Hackermüller

Summary

deepFPlearn is an AI tool that predicts associations between chemicals and gene targets. Based on their molecular structure, chemicals often interfere with biomolecules, leading to adverse effects in the respective organism. deepFPlearn is a ready-to-use deep learning (DL) tool that combines feature reduction with a deep autoencoder and subsequent classification with a deep feed-forward neural network. We decreased the discrepancy between large descriptor size (molecular structure of a chemical) and the limited amount of labeled training data by i) using a simple representation of the chemical's structure – the binary fingerprint; and ii) by applying feature compression prior to the classification to an effect. We provide trained models for endocrine disruption (ED), i.e., chemicals that mimic or interfere with the body's hormones. However, the tool is highly flexible and trainable with other datasets.

Availability

Code repository https://github.com/yigbt/deepFPlearn
Preprint Jana Schor, Patrick Scheibe, Matthias Bernt, Wibke Busch, Chih Lai, Jörg Hackermüller
AI for predicting chemical-effect associations at the universe level - deepFPlearn
bioRxiv 2021.06.24.449697; doi: https://doi.org/10.1101/2021.06.24.449697

Containers for Reproducible Research

Description

We built a docker container specifically designed for transcriptomics data analysis. We utilize the rocker/verse container and extend them by several R packages from CRAN and Bioconductor to ensure a reproducible working environment.

Within the container, a rstudio-server is running and enables remote access through the webbrowser.

Availability

Download the docker container from DockerHub: https://hub.docker.com/r/boll3/rocker_transcriptomics

Author

Sebastian Canzler

rocker/verse

The rocker project offers version-stable rocker images with rstudio server. The particular rocker/verse images are extended by tidyverse packages as well as tex and publishing-related packages.

Current rocker/verse version: 4.1.0

Usage

How to use the docker container is nicely described in the rocker manual.


Additional Packages


In order to be able to run transcriptomics analysis, we extended the rocker/verse container by several R packages from CRAN and Bioconductor.

Plotting and visuals
  • EnhancedVolcano
  • karyoploteR
  • enrichplot

Differential gene expression analysis
  • DESeq2
  • IHW
  • sva
  • RUVSeq

Functional characterization
  • fgsea
  • multiGSEA
  • clusterProfiler
  • EGSEA

Annotation
  • org.Rn.eg.db
  • org.Hs.eg.db
  • org.Mm.eg.db
  • org.Dr.eg.db
  • biomaRt
  • AnnotationHub
  • metaboliteIDmapping
  • BSgenome.Rnorvegicus.UCSC.rn6

CRAN packages
  • Rcpp
  • BiocParallel
  • hexbin
  • apeglm
  • ashr
  • glmpca
  • pheatmap
  • eulerr
  • PoiClaClu
  • msigdbr
  • gtools
  • DT
  • proj4
  • WGCNA
  • msigdbr
  • bookdown
  • gridExtra
  • xtable
  • ggnewscale
  • ggupset
  • ggridges

Description

Here, we published a docker container specifically designed for
multi-omics data analysis.  We utilize the rocker/verse container
and extend them by several R packages from `CRAN` and `Bioconductor` to
ensure a reproducible working environment.

Avaliability

Download the docker container from DockerHub: https://hub.docker.com/r/boll3/rocker_multiomics

Author

Sebastian Canzler

rocker/verse

The rocker project offers
version-stable rocker images with rstudio server.  The particular
rocker/verse images  are
extended by tidyverse packages as well as tex and publishing-related
packages.

Current rocker/verse version: 4.1.2

Usage

How to use the docker container is nicely described in the rocker manual.


Additional Packages

In order to be able to run multi-omics analysis, we extended the
rocker/verse container by several R packages from `CRAN` and
`Bioconductor`.

Multi-omics analysis

  • MOFA2
  • mixOmics


Plotting and visuals

  • EnhancedVolcano
  • enrichplot


Tools for single-omics analysis and data preparation

  • DESeq2
  • limma
  • DEP


Functional characterization

  • fgsea
  • multiGSEA
  • clusterProfiler
  • EGSEA


Annotation

  • biomaRt
  • AnnotationDbi
  • AnnotationHub
  • org.Rn.eg.db
  • org.Hs.eg.db
  • org.Mm.eg.db
  • org.Dr.eg.db
  • metaboliteIDmapping
  • BSgenome.Rnorvegicus.UCSC.rn6


CRAN packages

  • Rcpp
  • BiocParallel
  • hexbin
  • eulerr
  • pheatmap
  • msigdbr
  • gtools
  • DT
  • proj4
  • bookdown
  • gridExtra
  • xtable
  • ggnewscale
  • ggupset
  • ggridges
  • reticulate



Packages

Description

The R package 'metaboliteIDmapping' provides a comprehensive mapping table of nine different Metabolite ID formats and their common name. The data has been collected and merged from four publicly available source, including HMDB, Comptox Dashboard, ChEBI, and the graphite Bioconductor R package.

Availability

Bioconductor package https://bioconductor.org/packages/metaboliteIDmapping/
Documentation http://bioconductor.org/packages/release/data/annotation/vignettes/metaboliteIDmapping/inst/doc/metaboliteIDmapping.html
Github https://github.com/yigbt/metaboliteIDmapping

Author

Sebastian Canzler