A Stochastic Segmentation Model for Joint DNA-RNA Microarray Data Analysis
The Graduate School, Stony Brook University: Stony Brook, NY.
DNA copy number change and epigenetic alteration often induce abnormal RNA expression level and have been linked to the development and progression of cancer. While various methods have been proposed for studying microarray DNA copy number and RNA expression data respectively, little statistical work has been done in modeling the relationship between the two. We propose for the joint analysis of the two types of data a new stochastic change-point model with latent variables, and an associated estimation procedure. Our method integrates hidden Markov model with Bayesian statistics to yield joint posterior distribution of DNA and RNA signal intensities throughout the whole genome. Explicit formulas of the posterior means are derived, which can be used to give direct estimates of the signal intensities without performing segmentation. A subsequent segmentation procedure is further provided to identify change-points and yield piecewise constant estimates of the signal intensities on each segment. Other quantities can also be derived from the posterior distribution for assessing the confidence of coincident and non-coincident change-points in the DNA and RNA sequences. Based on these estimates, chromosomal regions with genetic and potential epigenetic aberrations can be identified. For computational simplicity we propose an approximation method to keep computation time linear in sequence length, hence the method can be readily applied to the new generation of higher-throughput arrays. The proposed method is illustrated through simulation studies and application to a real data set.