Tools
Authors
Christoph Kämpf, Michael Specht, Sven-Holger Puppel, Alexander Scholz, Gero Doose, Kristin Reiche, Jana Schor, Jörg Hackermüller
Summary
uap executes, controls and keeps track of the analysis of large data sets. It enables users to perform robust, consistent, and reproducible data analysis. uap encapsulates the usage of (bioinformatic) tools and handles data flow and processing during an analysis. Users can use predefined or self-made analysis steps to create custom analysis. Analysis steps encapsulate best practice usages for bioinformatic software tools. uap focuses on the analysis of high-throughput sequencing (HTS) data. But its plugin architecture allows users to add functionality, such that it can be used for any kind of large data analysis.
uap is a command-line tool, implemented in Python. It requires a user-defined configuration file, which describes the analysis, as input.
upa supports grid engines such as SLURM and UGE for connecting to HPC clusters.
Important Links
Software download | https://github.com/yigbt/uap |
Documentation | https://uap.readthedocs.io/en/master/index.html |
Docker build's context | https://github.com/yigbt/uap-docker |
Travis CI | https://travis-ci.org/yigbt/uap |
Singularity Container | https://cloud.sylabs.io/library/bioinf_ufz/uap/uap.sif |
Authors
Jana Schor, Patrick Scheibe, Matthias Bernt, Wibke Busch, Chih Lai, Jörg Hackermüller
Summary
deepFPlearn is an AI tool that predicts associations between chemicals and gene targets. Based on their molecular structure, chemicals often interfere with biomolecules, leading to adverse effects in the respective organism. deepFPlearn is a ready-to-use deep learning (DL) tool that combines feature reduction with a deep autoencoder and subsequent classification with a deep feed-forward neural network. We decreased the discrepancy between large descriptor size (molecular structure of a chemical) and the limited amount of labeled training data by i) using a simple representation of the chemical's structure – the binary fingerprint; and ii) by applying feature compression prior to the classification to an effect. We provide trained models for endocrine disruption (ED), i.e., chemicals that mimic or interfere with the body's hormones. However, the tool is highly flexible and trainable with other datasets.
Availability
Code repository | https://github.com/yigbt/deepFPlearn |
Preprint |
Jana Schor, Patrick Scheibe, Matthias Bernt, Wibke Busch, Chih Lai, Jörg Hackermüller AI for predicting chemical-effect associations at the universe level - deepFPlearn bioRxiv 2021.06.24.449697; doi: https://doi.org/10.1101/2021.06.24.449697 |
Authors
Sebastian Canzler, Jörg Hackermüller
Summary
Gaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods. A plethora of different tools and algorithms have been developed so far. Among those, the gene set enrichment analysis (GSEA) proved to control both type I and II errors well.
In recent years the call for a combined analysis of multiple omics layer became prominent, giving rise to a few multi-omics enrichment tools. Each of which has its own drawbacks and restrictions regarding its universal application.
Here, we present the multiGSEA package aiding to calculate a combined GSEA-based pathway enrichment on multiple omics layer. The package queries 8 different pathway databases and relies on the robust GSEA algorithm for a single-omics enrichment analysis. In a final step, those scores will be combined to create a robust composite multi-omics pathway enrichment measure. multiGSEA supports 11 different organisms and includes a comprehensive mapping of transcripts, proteins, and metabolite IDs.
Important links
Software download | GitHub |
Documentation |
Bioconductor Vignette |
Bioconductor |
Bioconductor Package |
Citation | Sebastian Canzler, Jörg Hackermüller. multiGSEA: A GSEA-based pathway enrichment analysis for multi-omics data. BMC Bioinformatics 21, 561 (2020). https://doi.org/10.1186/s12859-020-03910-x |
Authors
Sebastian Canzler, Markus Fischer, David Ulbricht, Nikola Ristic, Peter W. Hildebrand, René Staritzbichler
Summary
ProteinPrompt is a webserver and stand-alone tool that uses machine-learning algorithms to calculate specific, currently unknown protein-protein interactions by means of the amino acid sequence alone. It's designed to quickly and reliably predict contacts based on an input sequence in order to scan large sequence libraries for potential binding partners, with the goal to accelerate and assure the quality of the laborious process of drug target identification.
Availability
Webserver | https://proteinformatics.uni-leipzig.de/protein_prompt/ |
Gitlab | https://gitlab.hzdr.de/proteinprompt/ProteinPrompt |
Docker Container |
GitLab Registry |
Citation | Sebastian Canzler, Markus Fischer, David Ulbricht, Nikola Ristic, Peter W Hildebrand, René Staritzbichler, ProteinPrompt: a webserver for predicting protein–protein interactions, Bioinformatics Advances, Volume 2, Issue 1, 2022, vbac059, https://doi.org/10.1093/bioadv/vbac059 |
Authors
Sebastian Canzler, Jörg Hackermüller, Jana Schor
Summary
It is a highly tedious task to collect omics data sets from different molecular levels such as transcriptome, proteome, and metabolome, to be used in a multi-omics data analysis. This is mainly because of a large amount of potential databases to search in, their non-unified querying system which results in a fairly large amount of manual work.
To surmount these obstacles, we developed the Multi-Omics Data set Finder (MOD-Finder) as part of the CEFIC LRI-C5 XomeTox project, an R Shiny application, to efficiently search for compound-related omics data sets in an automated manner. Therefore, several publicly available databases are automatically queried for data sets with relation to a user specified compound or toxicant. The results are presented in a plain datatable. Additionally, compound-related information such as distinct IDs, synonyms, description, as well as visualizations regarding chemical-gene interactions or KEGG pathway enrichments are provided.
Important Links
Source code | https://github.com/yigbt/MOD-Finder |
Citation |
Canzler, S, Hackermüller, J, Schor, J (2019): MOD-Finder: Identify multi-omics data sets related to defined chemical exposure; arxiv.org (preprint);https://doi.org/10.48550/arXiv.1907.06346 |
Containers for Reproducible Research
Description
We built a docker container specifically designed for transcriptomics data analysis. We utilize the rocker/verse container and extend them by several R packages from CRAN and Bioconductor to ensure a reproducible working environment.
Within the container, a rstudio-server is running and enables remote access through the webbrowser.
Availability
Download the docker container from DockerHub: https://hub.docker.com/r/boll3/rocker_transcriptomics
Author
Sebastian Canzler
rocker/verse
The rocker project offers version-stable rocker images with rstudio server. The particular rocker/verse images are extended by tidyverse packages as well as tex and publishing-related packages.
Current rocker/verse version: 4.1.0
Usage
How to use the docker container is nicely described in the rocker manual.
Additional Packages
In order to be able to run transcriptomics analysis, we extended the rocker/verse container by several R packages from CRAN and Bioconductor.
Plotting and visuals
- EnhancedVolcano
- karyoploteR
- enrichplot
Differential gene expression analysis
- DESeq2
- IHW
- sva
- RUVSeq
Functional characterization
- fgsea
- multiGSEA
- clusterProfiler
- EGSEA
Annotation
- org.Rn.eg.db
- org.Hs.eg.db
- org.Mm.eg.db
- org.Dr.eg.db
- biomaRt
- AnnotationHub
- metaboliteIDmapping
- BSgenome.Rnorvegicus.UCSC.rn6
CRAN packages
- Rcpp
- BiocParallel
- hexbin
- apeglm
- ashr
- glmpca
- pheatmap
- eulerr
- PoiClaClu
- msigdbr
- gtools
- DT
- proj4
- WGCNA
- msigdbr
- bookdown
- gridExtra
- xtable
- ggnewscale
- ggupset
- ggridges
Description
Here, we published a docker container specifically designed for
multi-omics data analysis. We utilize the rocker/verse container
and extend them by several R packages from `CRAN` and `Bioconductor` to
ensure a reproducible working environment.
Avaliability
Download the docker container from DockerHub: https://hub.docker.com/r/boll3/rocker_multiomics
Author
Sebastian Canzler
rocker/verse
The rocker project offers
version-stable rocker images with rstudio server. The particular
rocker/verse images are
extended by tidyverse packages as well as tex and publishing-related
packages.
Current rocker/verse version: 4.1.2
Usage
How to use the docker container is nicely described in the rocker manual.
Additional Packages
In order to be able to run multi-omics analysis, we extended the
rocker/verse container by several R packages from `CRAN` and
`Bioconductor`.
Multi-omics analysis
- MOFA2
- mixOmics
Plotting and visuals
- EnhancedVolcano
- enrichplot
Tools for single-omics analysis and data preparation
- DESeq2
- limma
- DEP
Functional characterization
- fgsea
- multiGSEA
- clusterProfiler
- EGSEA
Annotation
- biomaRt
- AnnotationDbi
- AnnotationHub
- org.Rn.eg.db
- org.Hs.eg.db
- org.Mm.eg.db
- org.Dr.eg.db
- metaboliteIDmapping
- BSgenome.Rnorvegicus.UCSC.rn6
CRAN packages
- Rcpp
- BiocParallel
- hexbin
- eulerr
- pheatmap
- msigdbr
- gtools
- DT
- proj4
- bookdown
- gridExtra
- xtable
- ggnewscale
- ggupset
- ggridges
- reticulate
Packages
Description
The R package 'metaboliteIDmapping' provides a comprehensive mapping table of nine different Metabolite ID formats and their common name. The data has been collected and merged from four publicly available source, including HMDB, Comptox Dashboard, ChEBI, and the graphite Bioconductor R package.
Availability
Author
Sebastian Canzler