TB Genome Annotation Portal

Gene Modules

Gene modules are sets of genes extracted from analysis of 249 M. tuberculosis transcriptomic datasets.
ICA (Independent Component Analysis) was used for clustering genes based on patterns of differential expression.
These gene modules represent clusters of genes with similar functionality based on gene expression profiles (which could be used as genesets for things like pathway enrichment analysis).

The Dual-ICA methodology was published in: Choudhery S, Ioerger TR. Dual ICA to extract interacting sets of genes and conditions from transcriptomic data. ACM BCB. 2023 Sep;2023:8. doi: 10.1145/3584371.3612968. Epub 2023 Oct 4. PMID: 38162633; PMCID: PMC10757798.

Methodology

    LFCs (log2-fold-changes) were computed for each gene in each conditions relative to a relevant control condition, as given by the corresponding study. The LFCs were normalized by dividing by the standard deviation for each condition. Independent Component Analysis (ICA) was performed twice ("dual-ICA"): once on the columns (conditions), and once on the rows (genes). A K-test (Sastry et al, 2021) was used on each Independent Component from the ICA performed on genes to determine which genes are significantly associated with a condition module.
      Not all genes are in a cluster. Only 1511 genes have a significant association and hence are represented in a cluster (out of 3327 genes which present in the intersection of ALL the datasets).
    Genes are hierachically clustered using the average the log fold changes calculated in each condition cluster. The resulting dendrogram is cut at a level to produce 160 gene clusters, with an average of ~10 gene members per clusters.

Notes Files