Welcome to the Ioerger Bioinformatics Lab.
We are part of the Department of Computer Science at Texas A&M University.
The Ioerger Bioinformatics Lab does interdisciplinary research that spans
computer science and biology. We apply statistical algorithms and
Machine Learning to biological data (genomics, transcriptomics, etc)
to study antibiotic resistance and to contribute to drug discovery for
tuberculosis and other infectious diseases. The projects we work on
are often highly collaborative and involve working with a wide range
of researchers in the Life Sciences. Our work ranges from genomics
(e.g. whole-genome sequencing, RNA-seq, TnSeq), to phylogenetics, to
structural biology (analysis of protein structures, protein-ligand
interactions, docking).
Computationally, many of the projects we are involved in require
development and implementation of novel statistical algorithms for
analyzing unique datasets from newly-developed experimental technologies
and data types to
quantify significance of inferences, while dealing appropriately with
uncertainty inherent in this data. We also focus on methods to
identify interactions in the data.
The biological applications of our research are primarily focused
on tuberulosis (TB), caused by the bacterial
pathogen Mycobacterium tuberculosis. TB infects many people
around the world, and outbreaks of multi-drug-resistant TB have been
increasing at an alarming rate. The TB research community is
collectively engaged in trying to understand basic pathways for
survival, adaptations to stress, and host-pathogen interactions
(e.g. within macrophages). However, only about half the ~4000 genes in
the Mycobacterium tuberculosis (Mtb) genome are annotated, and some of
these are just generic annotations based on homology (e.g. 'oxidoreductase'). This
lack of knowledge about basic functions for so many genes in the Mtb
genome hampers drug discovery efforts, because we don't know enough
about drug targets and conditions under which they are vulnerable.
Our lab has been employing bioinformatics methods
to try to better annotate the genome, understand biological pathways,
interpret drug resistance mutations in isogenic mutants (to identify
new potential drug targets), and analyze patterns of resistance
mutations in clinical cohorts (to try to understand how resistance to
existing drugs arises and spreads in a natural population).
These methods can be applied
to other infectious pathogens as well, such as methicillin-resistant
Staphylococcus aureus, and other clinically-important
mycobacteria like M. avium and M. abscessus.
A major focus of our lab is determining which genes
are essential under what conditions using TnSeq (sequencing of
transposon-insertion mutant libraries). TnSeq yields information on conditional
essentiality of genes and genetic interactions that is useful for
elucidating the functions of genes and identifying good drug
targets. TnSeq data is intrinsically noisy, and we have been
developing tools for rigorous assessment of statistical significance
of essentiality predictions derived from this data. We distribute a
python-based software package for TnSeq analysis called TRANSIT, which includes
implementations of many of the statistical methods we have developed. This
work has contributed many insights into aspects of mycobacterial
biology, and is ultimately oriented toward facilitating the discovery of new drug
candidates for TB.
Projects in the Ioerger Bioinformatics Lab
TnSeq - Sequencing of Transposon Mutant Libraries
TnSeq is a genome-wide screen that determines which genes are
essential for survival under different conditions, which has
wide-ranging uses from understanding pathways to virulence to
host-pathogen interactions. A major focus in the lab is developed of
statistical methods for analysis of TnSeq data - converting large
files of raw sequencing reads into lists of essential (or
conditionally essential) genes and quantifying their statistically
significance. The challenge is that TnSeq data (insert counts in the
genome) is intrinsically noisy, and we attempt to draw rigorous
inferences while avoiding false positives, using a combination of
frequentist and Bayesian methods. We have combined our algorithms
together in a software package we distribute called
TRANSIT,
which is used by labs around the world for TnSeq. TRANSIT encodes best
practices for processing TnSeq data, quality assessment,
normalization, and analysis, enabling users to draw statistically
rigorous inferences from their data. We have used TnSeq to study
metabolism (e.g. growth on various carbon sources), antibiotic stress,
cell wall synthesis, and variability of essentiality among clinical
isolates. We also use TnSeq in knock-out strains to try to
infer gene
functions through genetic interactions, and thus to
annotate genes in the H37Rv
genome This work is done with multiple collaborators, including
Chris Sassetti at UMass Medical School.
Drug Target Identification
We participate in the Tuberculosis Drug Accelerator (TBDA), which is a
consortium of academic and pharmaceutical labs funded by the Bill &
Melinda Gates Foundation. Drug discovery is a complex endeavor with
many stages, and the Gates Foundation assembled a team of academic
labs and pharmaceutical partners, with experts at each stage (from
high-throughput screening of compound libraries, to medicinal
chemistry, to mechanism and structure determination, to
pharmacokinetics in animal models). Our contribution to this pipeline
is whole-genome sequencing of isogenic resistant mutants to identify
the targets and mechanisms of novel inhibitors. This requires
identifying and interpreting resistance mutations (SNPs, indels,
duplications, transposon hopping...), assessed against the backdrop of
what it currently known about mycobacterial growth (including
essentiality), metabolic pathways, regulation, and stress response.
Our work on this project has led to downstream development of lead
compounds targeting a variety of enzymes like FadD32 and Pks13
(mycolic acid synthesis), GlcB (malate synthase glyoxylate shunt),
biotin protein ligase, and PptT (CoA biosynthesis). These projects
also involve some chemi-informatics, docking, modeling of
protein-ligand interactions, and SAR. This is joint work with
Jim Sacchettini in the TAMU Dept. of Biochemistry & Biophysics.
Chemical Genomics
An important step in drug discovery is identifying the protein targets
of inhibitors (such as compounds from high-throughput screens). A
novel methodology that is being developed for this is to construct a
library (pool) of knock-down mutants (e.g. where intracellular levels
of genes like DNA gyrase can be artifically reduced), and to profile
their behavior (e.g. growth impairment) in the presence of inhibitors
using next-generation sequencing. We are developing statistical
models to quantify the relative depletion of mutants in the libary,
and machine learning methods to detect patterns that will enable us to
infer which protein or bological process is the target of a given
compound. This work is a collaboration with Dirk Schnappinger at
Weill Cornell Medical College in NY, and is funded by the Bill &
Melinda Gates Foundation.
Evolution of Drug Resistance
Despite what we might find in the lab about how a drug works, it does
not necessarily tell us how the bacteria are going to respond
clincially (in terms of frequency and mechanisms of resistance). In
order to better understand how resistance to existing antibacterial
drugs arises, we are sequencing the genomes of large
collections of clinical isolates, doing phylogenetic analysis, and
determining resistance mutation profiles for existing drugs (isonizid,
rifampicin, pyrazinamide) and novel drugs (bedaquiline, pretomanid,
linezolid). We are actively involved in sequencing clinical
isolates from disease outbreaks around the world, and performing
statistical analyses of associations of polymorphisms with drug
resistance (GWAS). We are also examining the acquisition of resistance
mutations in animal models of disease.
We are interested understanding the effect of fitness costs,
compensatory mutations (epistasis), lineage-specific effects, novel
mechanisms (efflux, detoxification, metabolic shift, etc), high- vs
low-level resistance (e.g. stepping stone mutations), etc. Mtb
generally evolves clonally (no recombination) and does not have
plasmids (which often facilitate exchange of drug resistance genes in
other pathogens like
S. aureus). Yet, we have observed that drug
resistance often arises independently in different strains easily in
Mtb. We are interested in studying how acquistion of resistance is
affected by other drugs in combination therapies, roles of latency and
transmission, and interactions with diet, patient compliance, and
co-morbidities (HIV, diabetes, etc).
Other Mycobacteria
The lab is expanding its research and methods from the
M. tuberculosis
to other clinically-important mycobacteria, such as
M. avium and
M. abscessus. We have been doing essentiality studies using
TnSeq in these orgnaisms, as well
sequencing of clinical isolates to understand drug resistance
(and hope to be doing drug screening soon).
These mycobacteria are more genetically diverse than
M. tuberculosis.
The projects employ comparative genomics to try to understand
how the biology and pathways and virulence of these organisms
is similar to (or different from) M. tuberculosis. Many of
the first- and second-line antitubercular drugs are not effective
against these bacteria, so drug discovery is just as urgently needed.
This work is being done in collaboration with various colleagues,
including Eric Rubin (Harvard School of Public Health) and
Thomas Dick (Rutgers).
Design of Peptidomimetic Inhibitors of Protein-Protein Interfaces
Many important biological and disease processes involve interactions
between proteins. Peptidomimetics are small molecules that can be
synthesized that mimic peptides of 3-5 amino acids and can potentially
bind in P-P interfaces and disrupt interactions. Previously, we
designed a search algorithm (called EKO) for finding clusters of amino
acids in interfaces matching a target geometric configuration that can
be used to assist in design of peptidomimetics with favorable
properties and synthetic routes. This work also involves protein
structure modeling and computational evaluation of conformational and
interaction energies. EKO is currently being applied to several
protein targets involved in cancer. This is a collaboration with Kevin
Burgess, TAMU Dept. of Chemistry, and is funded in part by a grant
from the Cancer Research and Prevention Institute of Texas (CPRIT).