Life Science Data Analytics
and Algorithmic Bioinformatics

GOSim (Bioconductor)

GOSim is an R-package allowing for computation of information theoretic Gene Ontology similarities between terms and gene products as well as GO enrichment analysis.


Froehlich, N. Speer, A. Poustka, T. Beissbarth (2007),
GOSim - An R-Package for Computation of Information Theoretic GO Similarities Between Terms and Gene Products, BMC Bioinformatics, 8:166.

gene2pathway (Bioconductor)

This R-package implements a machine learning based (hierarchical classification) approach to predict KEGG pathway membership of genes via protein domains signatures.


H. Froehlich, M. Fellmann, H. Sültmann, A. Poustka, T. Beißbarth (2008),
Predicting Pathway Membership via Domain Signatures, Bioinformatics, 24:2137-2142.

nem (Bioconductor)

This R-package implements a varity of algorithms for causal reverse engineering / structure learning of biological networks from high dimensional intervention effects: (Dynamic) Nested Effects Models, (Dynamic) Deterministic Effects Propagation Networks.


Martin Pirkl, Madeline Diekmann, Marlies van der Wees, Niko Beerenwinkel, Holger Froehlich, Florian Markowetz (2017), Inferring Modulators of Genetic Interactions with Epistatic Nested Effects Models, PLoS Comp Biol, 13(4): e1005496.

Paurush Praveen, Helen Huelsmann, Holger Sueltmann, Ruprecht Kuner, Holger Froehlich (2016),
Cross-talk between AMPK and EGFR dependent Signaling in Non-Small Cell Lung Cancer, Scientific Reports, 6, 27514.

J. Siegbourg-Polster, D. Mudrak, M. Emmenlauer, P. Raemoe, C. Dehio, U. Greber, H. Froehlich, N. Beerenwinkel (2015), NEMix: Single-cell Nested Effects Models for Probabilistic Pathway Stimulation, PLoS Comp. Biol., 11(4):e1004078

H. Failmezger, P. Praveen, A. Tresch, H. Froehlich (2013), Learning Gene Network Structure from Time Laps Cell Imaging in RNAi Knock-Downs, Bioinformatics, 29(12):1534-40.

H. Froehlich, P. Praveen, A. Tresch (2011), Fast and Efficient Dynamic Nested Effects Models, Bioinformatics, 27, 238-244.

T. Niederberger, S. Etzold, M. Lidschreiber, K. C. Maier, D. E. Martin, H. Froehlich, P. Cramer, A. Tresch (2012), MC EMiNEM Maps the Interaction Landscape of the Mediator, PLoS Comp. Biol., 8(6): e1002568

H. Froehlich, Oe. Sahin, D. Arlt, C. Bender and T. Beissbarth (2009), Deterministic Effects Propagation Networks for Reconstructing Protein Signaling Networks from Multiple Interventions, BMC Bioinformatics, 10:322.

H. Froehlich, A. Tresch, T. Beissbarth (2009), Nested Effects Models for Learning Signaling Networks from Perturbation Data, Biometrical Journal, 2(51):304 - 323.

C. Zeller, H. Froehlich, A. Tresch (2008), A Bayesian Network View on Nested Effects Models, EURASIP Journal on Bioinformatics and Systems Biology, 1.

H. Froehlich, T. Beissbarth, A. Tresch, D. Kostka, J. Jacob, R. Spang, F. Markowetz (2008), Analyzing Gene Perturbation Screens With Nested Effects Models in R and Bioconductor, Bioinformatics, 24(21):2549-2050.

H. Froehlich, M. Fellmann, H. Sueltmann, A. Poustka, T. Beissbarth (2008), Estimating Large Scale Signaling Networks through Nested Effects Models from Intervention Effects in Microarray Data, Bioinformatics, 24:2650-2656.

H. Froehlich, M. Fellmann, H. Sueltmann, A. Poustka, T. Beissbarth (2007), Large Scale Statistical Inference of Signaling Pathways from RNAi and Microarray Data, BMC Bioinformatics, 8:386.

pathClass (CRAN)

R-package pathClass is a collection of classification methods that use information about feature connectivity in a biological network as an additional source of information. This additional knowledge is incorporated into the classification a priori.


M. Johannes, H. Froehlich, H. Sültmann, T. Beissbarth (2011), pathClass: An R-Package for Integration of Pathway Knowledge into Support Vector Machines for Biomarker Discovery, Bioinformatics,  27, 1442- 1443. 

M. Johannes, J. Brase, H. Froehlich, S. Gade, M. Gehrmann, M. Faelth, H. Sültmann, T. Beissbarth (2010), Integration Of Pathway Knowledge Into A Reweighted Recursive Feature Elimination Approach For Risk Stratification Of Cancer Patients, Bioinformatics, 26(17), 2136 - 2144.

netClass (CRAN)

R-package netClass implements network-based feature (gene) selection for biomarkers discovery via integrating biological information. This package adapts the following 5 algorithms for classifying patients using prior knowledge: 1) average gene expression of pathway (aep); 2) pathway activities classification (PAC); 3) Hub network Classification (hubc); 4) filter via top ranked genes (FrSVM); 5) network smoothed t-statistic (stSVM). Notably, stSVM optionally allows for multi-omics based classification using, e.g. mRNA + miRNA data and target predictions.


Y. Cun, H. Froehlich (2014), netClass: An R-package for network based, integrative biomarker signature discovery, Bioinformatics, 30(9):1325 - 1326

Y. Cun, H. Froehlich (2013), Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics, PLoS ONE, 8(9):e73074.

birta (Bioconductor, older)

Expression levels of mRNA molecules are regulated by different processes, comprising inhibition or activation by transcription factors and post-transcriptional degradation by microRNAs. birta (Bayesian Inference of Regulation of Transcriptional Activity) uses the regulatory networks of TFs and miRNAs together with mRNA and miRNA expression data to predict switches in regulatory activity between two conditions. A Bayesian network is used to model the regulatory structure and Markov-Chain-Monte-Carlo is applied to sample the activity states.


B. Zacher, K. Abnaof, S. Gade, E. Younesi, A. Tresch, H. Froehlich (2012), Joint Bayesian Inference of Condition Specific miRNA and Transcription Factor Activities from Combined Gene and microRNA Expression Data, Bioinformatics, 28(13):1714 - 1720.

birte (Bioconductor, newer approach)

Expression levels of mRNA molecules are regulated by different processes, comprising inhibition or activation by transcription factors and post-transcriptional degradation by microRNAs. biRte uses regulatory networks of TFs, miRNAs and possibly other factors, together with mRNA, miRNA and other available expression data to predict the relative influence of a regulator on the expression of its target genes. Inference is done in a Bayesian modeling framework using Markov-Chain-Monte-Carlo. A special feature is the possibility for follow-up network reverse engineering between active regulators via nem.


Holger Froehlich (2015), biRte: Bayesian Inference of Context Specific Regulator Activities and Transcriptional Networks, Bioinformatics, 31(20):3290-8, 2015.

SteinerNet (CRAN archive)

SteinerNet allows for the calculation of Steiner trees on large biological networks using several heuristic algorithms. The goal is to extract minimal, informative biological sub-networks.


A. Sadeghi, H. Froehlich(2013),
Steiner Tree Methods for Optimal Sub-Network Identification: an Empirical Study, BMC Bioinformatics, 14:144.

Tropical Equilibration

Algebraic method for ordering variables in biochemical networks according to velocity.


Satya Swarup Samal, Dima Grigoriev, Holger Froehlich, Andreas Weber, Ovidiu Radulescu (2015),
A Geometric Method for Model Reduction of Biochemical Networks with Polynomial Rate Functions, Bulletin of Mathematical Biology, 1 – 32

Satya Swarup Samal, Dima Grigoriev, Holger Froehlich, Ovidiu Radulescu (2015),
Analysis of Reaction Network Systems using Tropical Geometry, Computer Algebra in Scientific Computing, Volume 9301 of the series Lecture Notes in Computer Science, pp 424-439

Linking Metabolic Network Features to Phenotypes using Sparse Group Lasso

Integration of metabolic networks with "-omics" data has been a subject of recent research in order to better understand the behaviour of such networks with respect to differences between biological and clinical phenotypes. Under the conditions of steady state of the reaction network and the non-negativity of fluxes, metabolic networks can be algebraically decomposed into a set of sub- pathways often referred to as extreme currents (ECs). Our objective is to find the statistical association of such sub- pathways with given clinical outcomes, resulting in a particular instance of a self-contained gene set analysis method. In this direction, we propose a method based on sparse group lasso (SGL) to identify phenotype associated ECs based on gene expression data. SGL selects a sparse set of feature groups and also introduces sparsity within each group. Features in our model are clusters of ECs, and feature groups are defined based on correlations among these features.


Satya Samal Swarup, Ovidiu Radulescu, Andreas Weber, Holger Froehlich (2017),
Linking Metabolic Network Features to Phenotypes using Sparse Group Lasso, Bioinformatics, accepted.

Bayesian Dynamic Elastic Net (BDEN)

This MATLAB code imlements the BDEN algorithm developed to detect errors in ordinary differential equations (ODEs). ODEs are a popular approach to quantitatively model molecular networks based on biological knowledge. However, such knowledge is typically restricted.Wrongly modelled biological mechanisms aswell as relevant external influence factors that are not included into the model likely manifest in major discrepancies between model predictions and experimental data. Finding the exact reasons for such observed discrepancies can be quite challenging in practice. In order to address this issue,we suggested a Bayesian approach to estimate hidden influences in ODE-based models. The method can distinguish between exogenous and endogenous hidden influences. Thus, we can detect wrongly specified as well as missed molecular interactions in the model. Altogether our method supports the modeller in an algorithmic manner to identify possible sources of errors in ODE-based models on the basis of experimental data.


Benjamin Engelhardt, Maik Kschischo, Holger Froehlich (2017),
A Bayesian Approach to Estimating Hidden Variables as well as Missing and Wrong Molecular Interactions in ODE Based Mathematical Models, Journal of the Royal Society Interface, accepted.

Benjamin Engelhardt, Holger Froehlich, Maik Kschischo (2016), Learning (from) the errors of a systems biology model, Nature Scientific Reports, 6, 20772