Publications

(Ordered newest to oldest)

23 Total Publications
50 Unique Authors

Enhancing Molecular Representation Learning through the Combination of 3D and 2D Graph Machine Learning

Pan IT & Romano JD LA

The 39th Annual AAAI Conference on Artificial Intelligence, 39(2025)

Network-based analyses of multiomics data in biomedicine

Kumar R, Romano JD, & Ritchie MD

BioData Mining, 18(2025)

Session Introduction: AI and Machine Learning in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface

Hardasht FN, Kim D, Romano JD, Tison G, Daneshjou R, & Chen JH

Pacific Symposium on Biocomputing, 30, 33-39(2025)

The Alzheimer's Knowledge Base: A Knowledge Graph for Alzheimer's Disease Research

Romano JD, Truong V, Kumar R, Venkatesan M, Graham BE, Hao Y, Matsumoto N, Li X, Wang Z, Ritchie MD, Shen L, & Moore JH FA

Journal of Medical Internet Research, 26, e46777(2024)

Centralized and Federated Models for the Analysis of Clinical Data

Li R, Romano JD, Chen Y, & Moore JH

Annual Review of Biomedical Data Science, 7, 179-199(2024)

Knowledge Graph Aids Comprehensive Explanation of Drug and Chemical Toxicity

Hao Y, Romano JD, & Moore JH

CPT: Pharmacometrics & Systems Pharmacology, 12, 1072-1079(2023)

Abstract

In computational toxicology, prediction of complex endpoints has always been challenging, as they often involve multiple distinct mechanisms. State-of-the-art models are either limited by low accuracy, or lack of interpretability due to their black-box nature. Here, we introduce AIDTox, an interpretable deep learning model which incorporates curated knowledge of chemical-gene connections, gene-pathway annotations, and pathway hierarchy. AIDTox accurately predicts cytotoxicity outcomes in HepG2 and HEK293 cells. It also provides comprehensive explanations of cytotoxicity covering multiple aspects of drug activity, including target interaction, metabolism, and elimination. In summary, AIDTox provides a computational framework for unveiling cellular mechanisms for complex toxicity endpoints.

Exploring genetic influences on adverse outcome pathways using heuristic simulation and graph data science

Romano JD, Mei L, Senn J, Moore JH, & Mortensen HM FA

Computational Toxicology, 25, 100261(2023)

Discovering venom-derived drug candidates using differential gene expression

Romano JD, Li H, Napolitano T, Realubit R, Karan C, Holford M, & Tatonetti NP FA

Toxins, 15, 451(2023)

Improving QSAR Modeling for Predictive Toxicology using Publicly Aggregated Semantic Graph Data and Graph Neural Networks

Romano JD, Hao Y, & Moore JH FA

Pacific Symposium on Biocomputing, 27, 187-198(2022)

Abstract

Quantitative Structure-Activity Relationship (QSAR) modeling is a common computational technique for predicting chemical toxicity, but a lack of new methodological innovations has impeded QSAR performance on many tasks. We show that contemporary QSAR modeling for predictive toxicology can be substantially improved by incorporating semantic graph data aggregated from open-access public databases, and analyzing those data in the context of graph neural networks (GNNs). Furthermore, we introspect the GNNs to demonstrate how they can lead to more interpretable applications of QSAR, and use ablation analysis to explore the contribution of different data elements to the final models' performance.

PMLB v1.0: An open-source dataset collection for benchmarking machine learning methods

Romano JD, Le TT, La Cava W, Gregg JT, Goldberg DJ, Chakraborty P, Ray NL, Himmelstein D, Fu W, & Moore JH FA

Bioinformatics, 38, 878-880(2022)

Automating predictive toxicology using ComptoxAI

Romano JD, Hao Y, Moore JH, & Penning T FA

Chemical Research in Toxicology, 35, 1370-1382(2022)

Knowledge-guided deep learning models of drug toxicity improve interpretation

Hao Y, Romano JD, & Moore JH

Patterns, 3(2022)

The promise of automated machine learning for the genetic analysis of complex traits

Manduchi E, Romano JD, & Moore JH

Human Genetics, 141, 1529-1544(2022)

Omics Methods in Toxins Research-A Toolkit to Drive the Future of Scientific Inquiry

Romano JD FA LA

Toxins, 14, 761(2022)

TPOT-NN: Augmenting tree-based automated machine learning with neural network estimators

Romano JD, Le TT, Fu W, & Moore JH FA

Genetic Programming and Evolvable Machines, 22, 207-227(2021)

Abstract

Automated machine learning (AutoML) and artificial neural networks (ANNs) have revolutionized the field of artificial intelligence by yielding incredibly high-performing models to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists on when to use one versus the other. Furthermore, relatively few tools exist that allow the integration of both AutoML and ANNs in the same analysis to yield results combining both of their strengths. Here, we present TPOT-NN—a new extension to the tree-based AutoML software TPOT—and use it to explore the behavior of automated machine learning augmented with neural network estimators (AutoML+NN), particularly when compared to non-NN AutoML in the context of simple binary classification on a number of public benchmark datasets. Our observations suggest that TPOT-NN is an effective tool that achieves greater classification accuracy than standard tree-based AutoML on some datasets, with no loss in accuracy on others. We also provide preliminary guidelines for performing AutoML+NN analyses, and recommend possible future directions for AutoML+NN methods research, especially in the context of TPOT.

Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses

Manduchi E, Fu W, Romano JD, Ruberto S, & Moore JH

BMC Bioinformatics, 21, 430(2020)

Abstract

Background: A typical task in bioinformatics consists of identifying which features are associated with a target outcome of interest and building a predictive model. Automated machine learning (AutoML) systems such as the Tree-based Pipeline Optimization Tool (TPOT) constitute an appealing approach to this end. However, in biomedical data, there are often baseline characteristics of the subjects in a study or batch effects that need to be adjusted for in order to better isolate the effects of the features of interest on the target. Thus, the ability to perform covariate adjustments becomes particularly important for applications of AutoML to biomedical big data analysis. Results: We developed an approach to adjust for covariates affecting features and/or target in TPOT. Our approach is based on regressing out the covariates in a manner that avoids 'leakage' during the cross-validation training procedure. We describe applications of this approach to toxicogenomics and schizophrenia gene expression data sets. The TPOT extensions discussed in this work are available at https://github.com/EpistasisLab/tpot/tree/v0.11.1-resAdj. Conclusions: In this work, we address an important need in the context of AutoML, which is particularly crucial for applications to bioinformatics and medical informatics, namely covariate adjustments. To this end we present a substantial extension of TPOT, a genetic programming based AutoML approach. We show the utility of this extension by applications to large toxicogenomics and differential gene expression data. The method is generally applicable in many other scenarios from the biomedical field. Keywords: AutoML; Covariate adjustment; Feature importance; Genetic programming; Pathways.

Ten simple rules for writing a paper about scientific software

Romano JD & Moore JH FA

PLoS Computational Biology, 16, e1008390(2020)

A Decade of Translational Bioinformatics: A Retrospective Analysis of "Year-in-Review" Presentations

Romano JD, Bernauer M, McGrath SP, Nagar SD, & Freimuth RR FA

AMIA Joint Summits on Translational Science Proceedings, 2019, 335-344(2019)

Informatics and Computational Methods in Natural Product Drug Discovery: A Review and Perspectives

Romano JD & Tatonetti NP FA

Frontiers in Genetics, 10, 368(2019)

Using a Novel Ontology to Inform the Discovery of Therapeutic Peptides from Animal Venoms

Romano JD & Tatonetti NP FA

AMIA Joint Summits on Translational Science Proceedings, 2016, 209-218(2016)

Abstract

Venoms and venom-derived compounds constitute a rich and largely unexplored source of potentially therapeutic compounds. To facilitate biomedical research, it is necessary to design a robust informatics infrastructure that will allow semantic computation of venom concepts in a standardized, consistent manner. We have designed an ontology of venom-related concepts - named Venom Ontology - that reuses an existing public data source: UniProt's Tox-Prot database. In addition to describing the ontology and its construction, we have performed three separate case studies demonstrating its utility: (1) An exploration of venom peptide similarity networks within specific genera; (2) A broad overview of the distribution of available data among common taxonomic groups spanning the known tree of life; and (3) An analysis of the distribution of venom complexity across those same taxonomic groups. Venom Ontology is publicly available on BioPortal at http://bioportal.bioontology.org/ontologies/CU-VO.

Adapting simultaneous analysis phylogenomic techniques to study complex disease gene relationships

Romano JD, Tharp WG, & Sarkar IN FA

Journal of Biomedical Informatics, 54, 10-38(2015)

Abstract

The characterization of complex diseases remains a great challenge for biomedical researchers due to the myriad interactions of genetic and environmental factors. Network medicine approaches strive to accommodate these factors holistically. Phylogenomic techniques that can leverage available genomic data may provide an evolutionary perspective that may elucidate knowledge for gene networks of complex diseases and provide another source of information for network medicine approaches. Here, an automated method is presented that leverages publicly available genomic data and phylogenomic techniques, resulting in a gene network. The potential of approach is demonstrated based on a case study of nine genes associated with Alzheimer Disease, a complex neurodegenerative syndrome. The developed technique, which is incorporated into an update to a previously described Perl script called "ASAP," was implemented through a suite of Ruby scripts entitled "ASAP2," first compiles a list of sequence-similarity based orthologues using PSI-BLAST and a recursive NCBI BLAST+ search strategy, then constructs maximum parsimony phylogenetic trees for each set of nucleotide and protein sequences, and calculates phylogenetic metrics (Incongruence Length Difference between orthologue sets, partitioned Bremer support values, combined branch scores, and Robinson-Foulds distance) to provide an empirical assessment of evolutionary conservation within a given genetic network. In addition to the individual phylogenetic metrics, ASAP2 provides results in a way that can be used to generate a gene network that represents evolutionary similarity based on topological similarity (the Robinson-Foulds distance). The results of this study demonstrate the potential for using phylogenomic approaches that enable the study of multiple genes simultaneously to provide insights about potential gene relationships that can be studied within a network medicine framework that may not have been apparent using traditional, single-gene methods. Furthermore, the results provide an initial integrated evolutionary history of an Alzheimer Disease gene network and identify potentially important co-evolutionary clustering that may warrant further investigation.

VenomKB, a new knowledge base for facilitating the validation of putative venom therapies

Romano JD & Tatonetti NP FA

Scientific Data, 2, 150065(2015)

Abstract

Animal venoms have been used for therapeutic purposes since the dawn of recorded history. Only a small fraction, however, have been tested for pharmaceutical utility. Modern computational methods enable the systematic exploration of novel therapeutic uses for venom compounds. Unfortunately, there is currently no comprehensive resource describing the clinical effects of venoms to support this computational analysis. We present VenomKB, a new publicly accessible knowledge base and website that aims to act as a repository for emerging and putative venom therapies. Presently, it consists of three database tables: (1) Manually curated records of putative venom therapies supported by scientific literature, (2) automatically parsed MEDLINE articles describing compounds that may be venom derived, and their effects on the human body, and (3) automatically retrieved records from the new Semantic Medline resource that describe the effects of venom compounds on mammalian anatomy. Data from VenomKB may be selectively retrieved in a variety of popular data formats, are open-source, and will be continually updated as venom therapies become better understood.

Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms

Boland MR, Jacunski A, Lorberbaum T, Romano JD, Moskovitch R, & Tatonetti NP

Wiley Interdisciplinary Reviews: Systems Biology and Medicine, 8, 104-122(2015)