Chromatin Structure and Regulation of Gene Expression

Chromosomes are compacted into increasingly complex chromatin structures within eukaryotic nuclei. In collaboration with Greg Crawford (Duke) and Jason Lieb (UNC-CH), we identify and characterize regions of open chromatin identified using genome-wide DNaseI hypersensitivity (Crawford) and FAIRE (Lieb) experiments, both of which involve high-throughput sequencing. The computational integration of these data with related gene expression, transcription factor binding, and epigenetic data provide a more complete picture of the complex process of gene transcription.

As members of the ENCODE (ENCyclopedia Of DNA Elements) Consortium, we are contributing to the goal of identifying all functional elements in the human genome. Along with the Drs. Crawford and Lieb, our group includes Vishy Iyer’s lab at Univ Texas-Austin, and Ewan Birney’s lab at the EBI. Our group has created open chromatin maps of the human genome in several diverse cell types with chromatin immunoprecipitation (ChIP) data providing initial functional annotations for these regions. We are also investigating allele-specific regulatory regions identified using sequences from the open chromatin and ChIP experiments. We have and continue to develop computational methods to integrate and analyze sequence data DNase I hypersensitivity, FAIRE, and ChIP experiments. We are currently focused on understanding chromatin changes between related cell types such as myoblasts and myotubes, prostate cancer with and without androgen stimulation, and between common cell types across multiple primate species.

Integrating Genomic Data to Study Complex Disease

Complex diseases, like cancer, consist of many histological subtypes and probably thousands of molecular subtypes that differ substantially with respect to their onset, progression, and response to treatment. High-throughput genomic assays are now capable of providing high-density genotypes and assessing genome-wide changes and variation in gene expression, genome copy number, allelic expression, and DNA methylation status throughout cancer initiation and progression. These experiments reveal different yet complementary information regarding the current state of a population of cancer cells and cells from other complex diseases. This ability to molecularly characterize complex disease has already resulted in novel diagnostic tests and treatments.

Current computational models designed to distinguish between phenotypically disparate samples are generally accurate but are difficult to interpret biologically and primarily rely on data from a single molecular assay. The careful and accurate integration of complementary data in biologically interpretable models provide a more complete and interpretable portrait of complex disease, for example providing new and stronger evidence of genetic changes associated with the root causes of observed differential gene expression.

We develop statistical methods and computational software that integrate high-dimensional heterogeneous but complementary data from cancer samples to identify fundamental genomic alterations with clinical and biological relevance. These methods are being developed in collaboration with Sayan Mukherjee at Duke University.