Genome-wide Analysis of Chromatin Binding Proteins in D. melanogaster and C. elegans
MetadataShow full item record
The mechanisms of regulating the translation of information encoded in DNA into gene expression have been intensively investigated since last century. A large portion of the efforts concentrate on characterizing the proteins that bind to specific chromatin or DNA regions. These proteins play important roles in the regulating hierarchy. Until the beginning of the 21st century, studies probing these chromatin binding proteins are generally conducted at the scale of a single gene or a limited region of the whole genome. The recent advancement in next-generation sequencing has provided a revolutionary method named as ChIP-seq that accurately generates genome-wide profiles of chromatin binding proteins. The modENCODE project has generated genome wide protein binding sites for a large number of chromatin binding proteins of model organisms D.melanogaster and C.elegans. It is thus possible to investigate the spatial distribution of these proteins at the genome-scale. To achieve this goal, an algorithm is needed to find protein binding sites across the genome. Although many existing algorithms suffice the basic need, none of them can resolve binding sites that stay closely to each other and does not sacrifice other desired properties such as specificity of the algorithm. In this thesis, I present my work in designing a ChIP-seq peak calling algorithm called PeakRanger which addresses the above-mentioned concerns. PeakRanger, along with other accessory computing programs are used to analyze the datasets generated by the modENCODE project. With these tools, genome-wide binding sites of a large selection of chromatin binding proteins are generated for both D.melanogaster and C.elegans. The distributions of D.melanogaster insulator binding proteins were analyzed in details, showing their global correlation with gene expression regulation. The properties of binding sites that stay closely to each other are also characterized, which is the first report of doublet binding sites of D.melanogaster. It is shown that doublet binding sites are preferred regions for histone markers of promoters.