Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping of TF binding sites genome-wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene.
I will present EMBER (Expectation Maximization of Binding and Expression pRofiles), a novel method that analyzes high-throughput binding data (e.g., ChIP-chip or ChIP-seq) coincidentally with gene expression data (e.g., DNA microarray or RNA-seq) via an unsupervised machine-learning algorithm to infer the gene targets of TF binding site sets. Genes selected are those that match over-represented expression patterns, and the output of the method includes both the putative gene targets and the mode of gene regulation (e.g., activation or repression). Following a description of the method, I will show biological validation of the target genes and expression patterns obtained by EMBER. Then I will present an application of EMBER for novel biological discovery, revealing the context-dependent regulatory modes of the TF IRF-4 in mouse effector B cell fate choice.